Mars-Report Final 181214

Mobile Application
Mobile Application
Rating Scale (MARS)
A new tool for assessing the quality of health mobile applications
Associate Professor Leanne Hides

Professor David Kavanagh
Stoyan Stoyanov
Dr Oksana Zelenko
Associate Professor Dian Tjondronegoro
Madhavan Mani
December 2014
Young and Well CRC

Unit 17, 71 Victoria Crescent
Abbotsford VIC 3067 Australia
youngandwellcrc.org.au
Mobile Application Rating Scale (MARS)
A new tool for assessing the quality of health mobile applications
Associate Professor Leanne Hides Professor David Kavanagh

QUT Institute of Health and Biomedical Innovation QUT Institute of Health and Biomedical Innovation
Stoyan Stoyanov Oksana Zelenko

QUT Institute of Health and Biomedical Innovation QUT Creative Industries
Associate Professor Dian Tjondronegoro Madhavan Mani

QUT Science and Engineering Faculty QUT Institute of Health and Biomedical Innovation
Institute of Health and Biomedical Innovation (IHBI), School of Psychology and Counselling, Queensland
University of Technology (QUT), Brisbane, Australia
Creative Industries (CI), Queensland University of Technology (QUT), Brisbane, Australia
Science and Engineering Faculty (SEF), Queensland University of Technology (QUT), Brisbane, Australia
ISBN: 978-0-9925966-5-1
Suggested citation: Hides, L et al. 2014, Mobile Application Rating Scale (MARS): A new tool for assessing the
quality of health mobile applications, Young and Well Cooperative Research Centre, Melbourne.
Copies of this guide can be downloaded from the Young and Well CRC website youngandwellcrc.org.au
Copyright and Disclaimer

The standard Young and Well CRC Copyright and Disclaimer notice will be inserted in the document during final
formatting. If the report is a joint publication then copyright can be shared.
2 // Safe. Healthy. Resilient.

Acknowledgements
This project was funded by the Young and Well Cooperative Research Centre (Young and Well CRC). The Young
and Well CRC (youngandwellcrc.org.au) is an Australian-based, international research centre that unites young
people with researchers, practitioners, innovators and policy-makers from over 70 partner organisations.
Together, we explore the role of technology in young people’s lives, and how it can be used to improve the mental
health and wellbeing of young people aged 12 to 25. The Young and Well CRC is established under the
Australian Government’s Cooperative Research Centres Program.
We would like to acknowledge Associate Professor Susan Keys and Michael Gould for their assistance with the
development of the original version of the MARS.
Our gratitude goes out to Dimitrios Vagenas for his statistical advice.
Associate Professor Leanne Hides is supported by an Australian Research Council Future Fellowship.
Young and Well Cooperative Research Centre Queensland University of Technology (QUT)
The Young and Well Cooperative Research QUT is a top Australian university with global
Centre is an Australian-based, international connections and an applied emphasis in courses
research centre that unites young people with and research best suited to the needs of industry
researchers, practitioners, innovators and policy- and the community. QUT has a reputation for
makers from more than 70 partner organisations. quality undergraduate and postgraduate courses
Together, we explore the role of technology in and has 42,000 students, including 6000 from
young people’s lives, and how it can be used to overseas. Courses are in high demand and its
improve the mental health and wellbeing of young graduate employment rate is well above the
people aged 12 to 25. The Young and Well CRC national average for Australian universities. The
is established under the Australian Government’s CRC for Young People, Technology and
th
Cooperative Research Centres Program. Wellbeing is QUT’s 10 CRC, and is a
partnership between the Faculties of Health,
youngandwellcrc.org.au Science and Engineering, Business and
Creative Industries at QUT
qut.edu.au

Table of contents
Executive summary………………………………………………………………………………………………………... 5
Introduction…………………………………………………………………………………………………………………. 6
Methods…………………………………………………………………………………………………………………….. 7
Results………………………………………………………………………………………………………………………. 9
Discussions………………………………………………………………………………………………………………... 12
References………………………………………………………………………………………………………………… 14
Appendices………………………………………………………………………………………………………………… 16
Glossary……………………………………………………………………………………………………………………. 33

Executive summary
Background
The use of mobile applications (apps) for health and wellbeing promotion has grown exponentially in recent years.
Yet, there is currently no app-quality assessment tool beyond “star”-ratings.
Objective
This study aimed to develop a reliable, multidimensional measure for trialling, classifying and rating the quality of
mobile health applications.
Methods
A literature search was conducted to identify articles containing explicit web or app quality rating criteria published
between January 2000 and January 2013. Existing criteria for the assessment of app quality were categorised by
an expert panel to develop the new Mobile App Rating Scale (MARS) subscales, items, descriptors and anchors.
Sixty wellbeing apps were randomly selected using an iTunes search for MARS rating. Ten were used to pilot the
rating procedure, and the remaining 50 provided data on inter-rater reliability.
Results
372 explicit criteria for assessing web or app quality were extracted from 25 published papers, conference
proceedings, and online resources. Five broad categories of criteria were identified including four objective quality
scales: engagement, functionality, aesthetics, information quality; and one subjective quality scale; which were
refined into the 23-item MARS. The MARS demonstrated excellent internal consistency (α = .90) and inter-rater
reliability (ICC = .79).
Conclusions
The MARS is a simple, objective and reliable tool for classifying and assessing the quality of mobile health apps.
It can also be used to provide a checklist for the design and development of new high quality health apps.

Introduction
The use of mobile applications (apps) for health and wellbeing promotion has grown exponentially in recent years
(Riley et al. 2011). Between 2013 and 2014 the global use of smartphones increased by 406 million, reaching
1.82 billion devices (up five percent, in a year), and internet usage via mobile devices has increased by 81
percent in one year (Cisco & Co 2014). 13.4 billion apps were downloaded in the first quarter of 2013 (Daum
2013) with projected figures of 102 billion for the whole year (Dredge 2013). The portability of smartphones
provides access to health information and interventions at any time in any context. The capabilities (for example,
sensors) of smartphones can also enhance the delivery of these health resources.
Given the rapid proliferation of smartphone apps, it is increasingly difficult for users, health professionals, and
researchers to readily identify and assess high quality apps (Cummings, Borycki & Roehrer 2013). Little
information on the quality of apps is available, beyond the star-ratings published on retailers’ webpages, and app
reviews are subjective by nature and may come from suspicious sources (Kuehnhausen & Frost 2013). Selecting
apps on the basis of popularity yields little or no meaningful information on app quality (Girardello & Michahelles
2010).
Much of the published literature focuses on technical aspects of websites, presented mostly in the form of
checklists, which do not assess the quality of these features (Aladwani and Palvia 2002; Olsina and Rossi 2002;
Seethamraju 2004). Website quality can be described as a function of: i) content, ii) appearance and multimedia,
iii) navigation, iv) structure and design, and v) uniqueness (Moustakis et al. 2004). A synthesis of website
evaluation criteria conducted by Kim et al. (1999) shortlisted 165 evaluation criteria, grouped in 13 groups (for
example, design and aesthetics, ease of use). However, 33 criteria were unable to be grouped and were coded as
"miscellaneous’, highlighting the complexity of the task. While many website criteria may be applicable to mobile
apps, there is a need to consider whether a specific quality rating scale may be needed for apps.
Attempts to develop mobile health (mHealth) evaluation criteria are often too general, complex, or specific to a
particular health domain. Handel (2011) reviewed 35 health and wellbeing mobile apps based on user ratings of:
i) ease of use, ii) reliability, iii) quality, iv) scope of information, and v) aesthetics. While these criteria may cover
important aspects of quality, no rationale for these specific criteria was provided. Khoja et al. (2013) described the
development of a matrix of evaluation criteria, divided into seven themes for each of the four stages of an app’s
lifecycle: i) development, ii) implementation, iii) integration, and iv) sustained operation. While this matrix provides
comprehensive criteria for rating app quality, the complex and time-consuming nature of the evaluation scheme
would be difficult to apply in routine practice and research. Furthermore, the matrix omits any evaluation of the
visual aesthetics of the app as a criterion.
Guidelines for evaluating the usability of mHealth apps were also compiled by the Health Care Information and
Management Systems Society (HIMSS) (2012). While the criteria were extensive, and included usability criteria
for rating efficiency, effectiveness, user satisfaction, and platform optimisation, no criteria for rating information
quality were included. This is problematic as it is important to evaluate the quality and quantity of health
information contained in mHealth apps to avoid potential harm to users (Su 2014). The HIMSS guidelines also
use a “Strongly agree” to “Strongly disagree” likert scale to rate each criterion, which does not provide an
indication of their quality. Strong agreement that a criterion is met (that is, clarity in whether a feature is present) is
not necessarily equivalent to meeting the criterion to a high degree.
A multidimensional, reliable and objective instrument is needed to rate the degree that mHealth apps
satisfy quality criteria. This scale should be easy to understand and use, and ideally should be
applicable to the needs of app developers, researchers, and health professionals.
OBJECTIVES
To develop a reliable, multidimensional instrument for app developers, researchers, and health professionals to
trial, classify and rate the quality of mobile health apps.

Methods
1.1 MARS DEVELOPMENT
A comprehensive literature search was conducted to identify articles containing explicit web or app-related quality
rating criteria. English-language papers from January 2000 through January 2013 were retrieved from Psychinfo,
Proquest, Ebscohost, IEEE Xplore, Web of Science, and Science Direct. The search terms were: “mobile” AND
“app*” OR “web*” PAIRED WITH “quality” OR “criteria” OR “assess*” OR “evaluat*”.
Three key websites, including the EU’s Usability.net, Nielsen Norman Group’s user experience (UX) criteria, and
Healthcare Information and Management Systems Society (HiMSS) were searched for relevant information.
References of retrieved articles were also hand-searched. Professional research manuals, unpublished
manuscripts and conference proceedings were also explored for additional quality criteria. After initial screening of
title and abstract, only studies that reported quality assessment criteria for applications or web content were
included.
Website and app assessment criteria identified in previous research were extracted. Criteria irrelevant to mobile
content and duplicates were removed. An advisory team of psychologists, interaction and interface designers and
developers, and professionals involved in the development of mHealth apps worked together to classify
assessment criteria into categories and sub-categories, and develop the scale items and descriptors. Additional
items assessing the app’s description in the online store and its evidence base were added. Corrections were
made until agreement between all panel members was reached.
1.2 MARS TESTING ON MENTAL HEALTH APPS

A systematic search of the Apple iTunes store was conducted on 19 September 2013, following the Prisma
guidelines for systematic literature reviews (Moher et al. 2010). An exhaustive list of mental-health related mobile
apps was created. The following search terms were employed: “Mindfulness” OR “Depression” OR “Wellbeing”
OR “Well-being” OR “Mental Health” OR “Anger” OR “CBT” OR “Stress” OR “Distress” OR “Anxiety”.
App inclusion criteria were: i) English language; ii) free of charge; iii) availability in the Australian iTunes store; iv)
from iTunes categories: “Health & Fitness”; “Lifestyle”; “Medical”; “Productivity”; “Music”; “Education”; “Utilities”.
The category inclusion criteria were based on careful scrutiny of the titles and types of applications present in
those categories.
Sixty apps were randomly selected using randomization.com. The first ten were used for training and piloting
purposes. Two expert raters: a research officer with a Research Masters in Psychology and two years’ experience
in mobile app development, and a PhD candidate with a Masters degree in Applied Psychology and over nine
years’ I.T. experience, trialled each of the first ten apps for a minimum of ten minutes and then independently
rated their quality using the MARS. The raters convened to compare ratings and address ambiguities in the scale
content until consensus was reached. The MARS was revised based on that experience, and the remaining 50
mental health and wellbeing related apps were trialled and independently rated.
1.3 STATISTICAL ANALYSES

A minimum sample size of 41 is required to establish whether the true inter-rater reliability lies within 0.15 of a
sample observation of 0.80, with 87 percent assurance (based on 10,000 simulation runs (Zou 2011)). The
sample size of 50 therefore provides substantial confidence in the estimation of the inter-rater reliability in the
current study. Data were analysed with SPSS version 21 (SPSS Inc., Chicago, IL, USA). The internal consistency
of the MARS quality subscales and total quality score was calculated using Cronbach's alpha. This indicates the
degree (correlations) to which items measuring the same general construct produce similar scores. Inter-rater
reliability of the MARS subscales and total score was determined by the intraclass correlation coefficient (ICC)
(Shrout & Fleiss 1979). This statistic allows for the appropriate calculation of weighted values of rater agreement
and accounts for proximity, rather than equality of ratings. A two-way mixed effects, average measures model
with absolute agreement was utilised (Hallgren 2012). The concurrent validity of the MARS total score with the
MARS overall star rating item and Apple iTunes app-store star rating of each app (reported on 19 September
2013) was determined.
1.4 DEVELOPMENT OF THE MARS ‘APP USER’ VERSION

A non-professional ‘app user’ version of the MARS was also created to collect app user ratings without MARS
training (See Appendix 4). Young people and health professionals reviewed the MARS to simplify item content
and the language used throughout the scale. Three MARS items that required raters to collect additional
information outside of the app itself were deleted. This included item 13 which required raters to read the app-
store description, item 14 which required raters to identify the goal of the app and item 19 which required raters to
evaluate the evidence base for the app.

Results
2.1 MARS DEVELOPMENT
The search strategy yielded 25 publications, including peer-reviewed journal articles (N=14), conference
proceedings (N=8), and online resources (N=3) containing explicit mobile or web-related quality criteria. The
complete list of utilised resources is available in Appendix 1: [Papers, publications and materials used for MARS
criteria selection]. A total of 427 criteria were extracted: 56 were removed as duplicates, and 22 were deemed
irrelevant to mobile applications. Through an iterative approach, the remaining 349 criteria were grouped into six
categories by the expert panel: one relating to app classification, four categories on objective app qualities
(engagement, functionality, aesthetics and information quality) and one on subjective app quality (see Table 1).
Table 1: Number of criteria for evaluation of mHealth app quality identified in the literature search
Criterion category Frequency %
App classification – confidentiality, security, registration, community, affiliation 12 3%
Aesthetics – graphics, layout, visual appeal 52 15%
Engagement – entertainment, customisation, interactivity, fit to target group, etc. 66 19%
Functionality – performance, navigation, gestural design, ease of use 90 26%
Information – quality, quantity, visual information, credibility, goals, description 113 32%
Subjective quality – worth recommending, stimulates repeat use, overall rating 16 5%
The classification category collected descriptive information on the app (for example, price, platform, rating) as
well as its technical aspects (for example, login, password-protection, sharing capabilities). Additional sections
collect information on the target age group of the app (if relevant) as well as information on what aspects of health
(including physical health, mental health, wellbeing) the app targets. These domains may be adapted to
include/exclude specific content areas as needed.
The app quality criteria were clustered within the engagement, functionality, aesthetics, information quality and
subjective quality categories, to develop 23 subcategories from which the 23 individual MARS items were
developed. Each MARS item used a five-point scale (1-Inadequate, 2-Poor, 3-Acceptable, 4-Good, 5-Excellent):
descriptors for these rating anchors were written for each item. In cases where an item may not be applicable for
all apps, an option of ‘Not applicable’ was included. The MARS items and rating descriptor terminology were
scrutinised by the expert panel, to ensure appropriate and consistent language was used throughout the scale.
The MARS is scored by calculating the mean scores of the engagement, functionality, aesthetics and information
quality objective subscales, and an overall mean app quality total score. Mean rather than total scores are used
because of ‘not applicable’ (NA) items. Additionally, mean scores are used to provide quality ratings
corresponding to the familiar format of star ratings. The subjective quality items can be scored separately as
individual items, or a mean subjective quality score. The MARS app classification section is for descriptive
purposes only.
2.2 MARS TESTING ON MENTAL HEALTH APPS

A total of 1533 mobile applications were retrieved from the iTunes search. All duplicate, non-English and paid
apps were removed. Apps from the categories ‘games’; ‘ books’; ‘business’; ‘catalog’; ‘entertainment’; ‘finance’;
‘navigation’; ‘news’; ‘social networking’; and ‘travel’ were also removed. Remaining apps were screened by title.
The app store descriptions of apps with unclear titles were reviewed prior to exclusion. App titles with the words
“magazine”, “mother”, “mum”, “job”, “festival”, “massage”, “shop” or “conference”, as well as company ads and
web apps were also excluded, as they were linked to irrelevant content. Sixty of the remaining 405 apps were
randomly selected for rating with the MARS (Figure 1).

Figure 1: Flow diagram of the app selection process
IDENTIFICATION
1533 apps identified

in iTunes and
screened
1015 excluded: duplicate
or not satisfying inclusion
criteria
518 remaining
SCREENING/ELIGIBILITY
113 excluded by title
405 remaining and

randomised
327 not selected for testing
18 excluded: faulty or
with irrelevant content
and randomly replaced
INCLUDED
60 rated by MARS
On attempting to rate the initial ten apps, it was found that one was faulty and could not be rated. MARS ratings of
the remaining nine apps indicated the scale had a high level of internal consistency (α = .78) and fair inter-rater
reliability (two-way mixed ICC = .57, 95 percent CI .41–.69). The ‘not applicable’ option was removed from items
within the engagement category, as this feature was considered to be an important and universal component of
all high-quality apps. The meaning of ‘visual information’ was clarified and the item rephrased. The stated or
inferred age target of ‘young people’ was defined as app users aged 16-25. The descriptor of ‘goals’ was clarified
to read: “Does the app have specific, measurable and achievable goals (specified in app store description or
within the app itself)?” to help distinguish it from the item ‘accuracy of app description’, which often relates to the
app’s goals. On the information subscale, raters found it difficult to determine when lack of information within an
app should be rated as ’not applicable’ or as a flaw: this item was therefore revised to require that information be
rated unless the apps were purely for entertainment. The final version of the MARS is provided in Appendix 3.
Independent ratings on the overall MARS total score of the remaining 50 mental health and wellbeing apps
demonstrated an excellent level of inter-rater reliability (two-way mixed ICC = .79; 95 percent CI .75-.83). The
MARS total score had excellent internal consistency (α = .90) and was highly correlated with the MARS star rating
item (#23), r(50) = .89, p < .001. Internal consistencies of the MARS subscales were also very high (α = .80-.89,
Median = .85), and their inter-rater reliabilities were fair to excellent (ICC = .50-.80; Median = .65). Detailed item
and subscale statistics are presented in Table 2.

Table 2: Inter-rater reliability and internal consistency of the MARS items and subscale scores, and
corrected item-total correlations and descriptive statistics of items, based on independent ratings of 50
mental health and wellbeing apps.
Corrected Mean SD
# Subscale / Item Item-Total
correlation
Engagement α = 0.89, ICC = 0.80 (95% CI 0.73-0.85)
1 Entertainment .63 2.49 1.24
2 Interest .69 2.52 1.20
3 Customisation .60 2.27 1.15
4 Interactivity .65 2.70 1.22
5 Target group .61 3.41 0.93
Functionality α = 0.80, ICC = 0.50 (95% CI 0.33-0.62)

6 Performance .42 4.00 0.93
7 Ease of use .29 3.93 0.87
8 Navigation .48 4.00 0.94
9 Gestural design .48 4.10 0.79
Aesthetics α = 0.86, ICC = 0.61 (95% CI 0.46-0.72)
10 Layout .56 3.91 0.87
11 Graphics .61 3.41 0.92
12 Visual appeal: How good does the app look? .60 3.14 0.91
a
Information α = 0.81, ICC = 0.79 (95% CI 0.71-0.84)
13 Accuracy of app description .67 3.66 1.03
14 Goals .70 3.43 1.10
15 Quality of information .47 3.18 1.46
16 Quantity of information .58 2.87 1.54
17 Visual information .39 1.35 1.89
18 Credibility .46 2.79 0.95
a a a
19 Evidence base ... …. ….
b
Subjective quality α = 0.93, ICC = 0.83 (95% CI 0.75-0.88)
20 Would you recommend this app? .84 2.31 1.17
21 How many times do you think you would use this app? .82 2.46 1.12
22 Would you pay for this app? .63 1.31 0.60
23 What is your overall star rating of the app? .89 2.69 1.06
a
Item 19 ‘Evidence base’ was excluded from all calculations, as it currently contains no measurable data.
b
The subjective quality subscale was excluded from the total MARS ICC calculation.
Only 15 of the 50 mental health and wellbeing apps extracted from the iTunes app-store had received the five
user ratings required for a star rating to be displayed. These apps showed a moderate correlation between the
iTunes star rating and the total MARS score (r(15) = .55, p < 0.05).

Discussion
3.1 PRINCIPAL RESULTS
The MARS is the first mHealth app quality rating tool to provide a multidimensional measure of the app quality
indicators of engagement, functionality, aesthetics and information quality as well as app subjective quality. These
app quality indicators were extracted from previous research across the UX, technical, human-computer
interaction and mHealth literature, but had not previously been combined in a singular framework. Previous
attempts to develop mobile app evaluation criteria have been too technical or specific to a particular health
domain. They have also not been developed and piloted in a systematic manner using an expert panel of health
professionals, designers and developers of health web and mobile apps. In contrast, the MARS is an easy-to-use
(with appropriate training), simple, objective, reliable and widely applicable measure of app quality, developed by
an expert multidisciplinary team. Although the generalisability of the MARS is yet to be tested, the MARS scale
can be modified to measure the quality of non-health related apps, allowing researchers and professionals to use
it in other domains. The MARS total mean score describes the overall quality of an app, while the mean
engagement, functionality, aesthetics and information quality subscale scores can be used to describe its specific
strengths and weaknesses.
The use of objective MARS item anchors and the high level of inter-rater reliability obtained in the current study
should allow health practitioners and researchers to use the scale with confidence. Both the app quality total
score and four app-quality subscales had high internal consistency, indicating that the MARS provides raters with
a reliable indicator of overall app quality as well as the quality of app engagement, functionality, aesthetics and
information quality. The exclusion of the subjective quality subscale from the overall mean app quality score, due
to its subjective nature, strengthens the objectivity of the MARS as a measure of app quality. Nevertheless, the
high correlation between the MARS quality total score and its overall star rating provides a further indication that it
is capturing perceived overall quality. It should be noted that the MARS overall star rating is likely to be influenced
by the prior completion of the 19 MARS app quality items. Nevertheless, the iTunes app store star ratings
available on 15 of the 50 mental health apps rated, were only moderately correlated with the MARS total score.
This was unsurprising, given the variable criteria likely to be used by different raters, the subjective nature of
these ratings and the lack of reliability of the iTunes star ratings, as has been highlighted in previous research
(Kuehnhausen & Frost 2013). In addition, the MARS overall star rating score was only moderately correlated with
the iTunes app store star rating. The MARS star rating is likely to provide a more reliable measure of overall app
quality, as it is rated following completion of the entire MARS and is therefore informed by the preceding items.
If multiple MARS raters are utilised, it is recommended that raters develop a shared understanding of the target
group for the apps, clarify the meaning of any MARS items they find ambiguous and determine if all MARS items
and subscales are relevant to the specific health area of interest. App-quality ratings should be piloted and
reviewed until an appropriate level of inter-rater reliability or consensus ratings are reached.
Due to the generic nature of the mHealth app quality indicators included in the MARS a number of “App Specific”
items have been added to the MARS to obtain information on the perceived impact of the app on the user’s
knowledge, attitudes and intentions related to the target health behaviour. (See App Specific section of the
MARS).
For convenience the MARS was piloted on iPhone, rather than Android apps. Since initial testing, however, the
scale has been applied to multiple Android apps and no compatibility issues were encountered. Thus, preliminary
data indicates the MARS is applicable to both iPhone and Android apps.
3.2 TRAINING AND RECOMMENDATIONS FOR USE

Training resources have been developed to ensure raters have understood the purpose of the MARS and how to
use it. The training provides information on how to rate the classification section of the MARS as well as each of
the quality scale items using their anchor points. Screenshots of existing mobile applications are used to illustrate
each MARS item, and examples of low and high quality apps on the anchor scale of each MARS item are
provided. The training slides are provided in Appendix 2 and are available from the corresponding author.

For the purpose of using the MARS, trainees are instructed to first visit the app store and review the app
description to identify information on the purpose, functionality, affiliations of the app developers as well as
information on how the app was developed and tested. If none of this information is provided, a quick Google
search using the app name is recommended to identify up-to-date information on the app. Google scholar may
also need to be searched to identify information on the evidence base for the app available in the scientific
literature. The app should then be trialled for at least ten minutes to determine how easy it is to use, how well it
functions and whether the app does what it purports to do. The app settings should also be reviewed for further
information on the development and testing of the app as well as its security features.
Following training, trainees are encouraged to complete an app rating exercise. This requires trainees to
download an app and rate it using the MARS. This activity is designed to allow trainees to check their
understanding of the MARS classification, quality items and their anchor points. Consensus expert ratings of the
app are also provided for trainees to check the reliability of their MARS ratings.
The ‘MARS app user’ version was developed to collect user ratings without a need for professional expertise. This
scale does not require raters to undertake training or review the app store descriptions, however it is
recommended that raters trial the app for at least ten minutes and check all components, buttons and functions of
the app for functionality and quality.
3.3 LIMITATIONS
While the original search strategy to identify the MARS app quality rating criteria was conducted using guidelines
for a systematic review, few peer-reviewed journal articles were identified. As a result, the search strategy was
expanded to include conference proceedings and online resources, which may not have been as extensively peer
reviewed. Suggested guidelines for scale development were followed (Oppenheim 1992) whereby a qualitative
analysis of existing research was conducted to extract app quality criteria and then develop app quality
categories, sub-categories, MARS items and their anchor ratings via a thematic review and expert panel ratings.
Despite these efforts, and the corrections made after piloting the scale, two MARS items on the functionality
subscale (‘ease of use’ and ‘navigation’) achieved only moderate levels of inter-rater reliability (ICC = 0.50). The
text ‘menu labels/icons’, was moved to ‘navigation’ component. ‘Ease of use’ was clarify to rate the availability of
clear instructions. These items are currently being tested for future revisions of the scale.
Researchers are yet to determine the effectiveness of the mental health apps included in this study in randomised
controlled trials. As a result, the MARS item ‘evidence base’ was not applicable for any of the apps in the current
study and we were unable to test its performance. It is hoped that as the evidence base for health apps develops,
the applicability of this MARS item will be tested. This important component of health-related mobile apps should
in the future become a standard for all applications making health-related claims (Su 2014).
3.4 FUTURE RESEARCH

Future research is required to determine the suitability and reliability of the MARS across multiple health and other
app domains. The association of the app quality total and subscale scores with the concepts of User Experience,
Quality of Experience and Quality of Service requires further investigation. Future refinements of MARS
terminology and additional items are likely to be required, as the functionality of mobile apps progresses. It is
hoped the current version of the MARS provides mHealth app-developers with a checklist of criteria for ensuring
the design of high-quality apps. With some modification, the MARS may also inform the development and quality
rating of health-related websites. Future research is also required to demonstrate the reliability of the MARS ‘app
user’ version.
3.5 CONCLUSION
The MARS provides a multidimensional, reliable, and flexible app-quality rating scale for researchers, developers
and health-professionals. Current results suggest that the MARS is a reliable measure of health app quality,
provided raters are sufficiently and appropriately trained.

References
Aladwani, AM & Palvia, PC 2002, Developing and validating an instrument for measuring user-perceived web
quality. Information & Management; vol 39, no 6, pp 467-476.
Cisco & Co, 2014. Cisco visual networking index: global mobile data traffic forecast update, 2013–2018, accessed
from http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-
vni/white_paper_c11-520862.html.
Cummings, E, Borycki, E & Roehrer, E 2013, Issues and considerations for healthcare consumers using mobile
applications. Studies in Health Technology and Informatics, vol 182, pp 227-231.
Daum, A 2013, 11% quarterly growth in downloads for leading app stores, accessed from
http://www.canalys.com/newsroom/11-quarterly-growth-downloads-leading-app-stores.
Dredge, S 2013, Mobile apps revenues tipped to reach $26bn in 2013, accessed from
http://www.theguardian.com/technology/appsblog/2013/sep/19/gartner-mobile-apps-revenues-report.
Girardello, A & Michahelles, F 2010, AppAware: which mobile applications are hot? Proceedings of the 12th
international conference on Human computer interaction with mobile devices and services.
Hallgren, KA 2012, Computing inter-rater reliability for observational data: an overview and tutorial. Tutorials in
quantitative methods for psychology, vol 8, no 1, pp 23-34.
Handel, MJ 2011, Mhealth (mobile health)-using apps for health and wellness. Explore-the Journal of Science and
Healing; vol 7, no 4, pp 256-261.
Health Care Information and Management Systems Society. Selecting a mobile app: evaluating the usability of
medical applications. mHIMSS App Usability Work Group, 2012.
Khoja, S, Durrani, H, Scott, RE, Sajwani, A & Piryani U 2013, Conceptual framework for development of
comprehensive e-health evaluation tool, Telemed J E Health, vol 19, no 1, pp 48-53.
Kim, P, Eng, TR, Deering, MJ & Maxfield, A 1999, Published criteria for evaluating health related web sites:
review, British Medical Journal, vol 318, no 7184, pp 647-649.
Kuehnhausen, M & Frost, VS 2013, Trusting smartphone apps? To install or not to install, that is the question.
Proceedings of the International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and
Decision Support, pp 25-28, San Diego, United States.
Moher, D, Liberati, A, Tetzlaff, J & Altman, DG 2010, Preferred reporting items for systematic reviews and meta-
analyses: the PRISMA statement. International Journal of Surgery, vol 8, no 5, pp 336-41.
Moustakis, V, Litos, C, Dalivigas, A & Tsironis, L 2004, Website quality assessment criteria. Proceedings of the
9th international conference of information quality, Nov 5-7, Boston, United States.
Olsina, L & Rossi, G 2002, Measuring web application quality with WebQEM. Multimedia, IEEE, vol 9, no 4, pp
20-29.
Oppenheim, AN 1992, Questionnaire design, interviewing and attitude measurement, New edition, London: Pinter
Publishers.
Riley, WT, Rivera, DE, Atienza, AA, Nilsen, W, Allison, SM & Mermelstein, R 2011, Health behavior models in the
age of mobile interventions: are our theories up to the task? Translational Behavioral Medicine, vol 1, no 1, pp 53-
71.
Seethamraju, R 2004, Measurement of user perceived web quality. Proceedings of the European Conference on
Information Systems, Finland.
Shrout, PE, Fleiss, JL 1979, Intraclass correlations: uses in assessing rater reliability. Psychological bulletin, vol
86, no 2, pp 420.
Su, W 2014, A preliminary survey of knowledge discovery on smartphone applications (apps): principles,
techniques and research directions for e-health. Proceedings of the International Conference on Complex Medical
Engineering, June 26-29; Taipei, Taiwan.
Zou, GY 2011, Sample size formulas for estimating intraclass correlation coefficients with precision and
assurance. Statistics in medicine, vol 31, no 29, pp 3972.

Appendices
Appendix 1: Papers, publications and materials used for MARS criteria selection
Contains a
Author / Year Title
scale

Publications
Aladwani AM, Palvia PC; 2001[1] Developing and validating an instrument for measuring user- Yes
perceived web quality
Doherty G, Coyle D, Matthews M; Design and evaluation guidelines for mental health No
2010[2] technologies
Eng TR; 2002[3] eHealth research and evaluation: Challenges and No
opportunities
Finstad K; 2010[4] The usability metric for user experience Yes
Handel MJ; 2011[5] mHealth (Mobile health) – Using apps for health and No
wellness
Ho B, Lee M, Armstrong AW; Evaluation criteria for mobile teledermatology applications No
2013[6] and comparison to major mobile teledermatology
applications
Kay-Lambkin FJ, White A, Baker Assessment of function and clinical utility of alcohol and Yes
AL; 2011[7] other drug web sites: An observational, qualitative study
Khoja S, Durrani H et al.; 2013[8] Conceptual framework for development of comprehensive e- No
Health evaluation tool
Kim P, Eng TR et al.; 1999[9] Published criteria for evaluating health related web sites: No
review
Lavie T, Tractinsky N; 2004[10] Assessing dimensions of perceived visual aesthetics of web Yes
sites
Moshagen M, Thielsch M; 2012[11] A short version of the visual aesthetics of websites inventory Yes
Oinas-Kukkonen H, Harjumaa M; A systematic framework for designing and evaluating No
2008[12] persuasive systems
Olsina L, Rossi G; 2002[13] Measuring web application quality with WebQEM No
Schulze K, Krömker H; 2010[14] A framework to measure user experience of interactive online No
products
Tuch AN, Roth SP, et al.; 2012[15] Is beautiful really usable? Toward understanding the relation No
between usability, aesthetics, and affect in HCI


Conference proceedings
Law ELC, Roto V et al.; 2009[16] Understanding, scoping and defining user eXperience: A survey No
approach
Moustakis V, Litos C et al.; Website quality assessment criteria Yes
2004[17]
Seethamraju R; 2006[18] Measurement of user-perceived web quality Yes
Väätäjä H, Koponen T, Roto V; Developing practical tools for user experience evaluation – a case Yes
2009[19] from mobile news journalism
Vermeeren APOS, Law ELC et al.; User experience evaluation methods: Current state and Yes
2010[20] development needs
Manuscripts
mHIMSS App Usability Work Selecting a mobile app: Evaluating the usability of medical Yes
Group; 2012[21] applications
Naumann F, Rolker C; 2000 [22] Assessment methods for information quality criteria No
Studies in Health Technology and Issues and considerations for healthcare consumers using mobile No
Informatics; 2013[23] applications
Websites
Nielsen J; 2003[24] Usability 101: Introduction to usability No

www.usabilitynet.org; 2006[25] What is usability? No

REFERENCES
1. Aladwani, A.M. and P.C. Palvia, Developing and validating an instrument for measuring user-perceived
web quality. Information & Management, 2002. 39(6): p. 467-476.
2. Doherty, G., D. Coyle, and M. Matthews, Design and evaluation guidelines for mental health
technologies. Interacting with computers, 2010. 22(4): p. 243-252.
3. Eng, T.R., eHealth research and evaluation: challenges and opportunities. Journal of health
communication, 2002. 7(4): p. 267-272.
4. Finstad, K., The usability metric for user experience. Interacting with Computers, 2010. 22(5): p. 323-
327.
5. Handel, M.J., mHealth (Mobile Health)—Using Apps for Health and Wellness. EXPLORE: The Journal of
Science and Healing, 2011. 7(4): p. 256-261.
6. Ho, B., M. Lee, and A.W. Armstrong, Evaluation Criteria for Mobile Teledermatology Applications and
Comparison of Major Mobile Teledermatology Applications. Telemedicine and e-Health, 2013.
7. Kay-Lambkin, F., et al., Assessment of function and clinical utility of alcohol and other drug web sites: An
observational, qualitative study. BMC public health, 2011. 11(1): p. 277.
8. Khoja, S., et al., Conceptual Framework for Development of Comprehensive e-Health Evaluation Tool.
TELEMEDICINE and e-HEALTH, 2013. 19(1): p. 48-53.
9. Kim, P., et al., Published criteria for evaluating health related web sites: review. Bmj, 1999. 318(7184): p.
647-649.
10. Lavie, T. and N. Tractinsky, Assessing dimensions of perceived visual aesthetics of web sites.
International journal of human-computer studies, 2004. 60(3): p. 269-298.
11. Moshagen, M. and M. Thielsch, A short version of the visual aesthetics of websites inventory. Behaviour
& Information Technology, 2013. 32(12): p. 1305-1311.
12. Oinas-Kukkonen, H. and M. Harjumaa, A systematic framework for designing and evaluating persuasive
systems, in Persuasive technology. 2008, Springer. p. 164-176.
13. Olsina, L. and G. Rossi, Measuring Web application quality with WebQEM. Multimedia, IEEE, 2002. 9(4):
p. 20-29.

14. Schulze, K. and H. Krömker. A framework to measure user experience of interactive online products. in
Proceedings of the 7th International Conference on Methods and Techniques in Behavioral Research.
2010. ACM.
15. Tuch, A.N., et al., Is beautiful really usable? Toward understanding the relation between usability,
aesthetics, and affect in HCI. Computers in Human Behavior, 2012. 28(5): p. 1596-1607.
16. Law, E.L.-C., et al. Understanding, scoping and defining user experience: a survey approach. in
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2009. ACM.
17. Moustakis, V., et al. Website Quality Assessment Criteria. in IQ. 2004.
18. Seethamraju, R., Measurement of User Perceived Web Quality. 2004.
19. Väätäjä, H., T. Koponen, and V. Roto. Developing practical tools for user experience evaluation: a case
from mobile news journalism. in European Conference on Cognitive Ergonomics: Designing beyond the
Product---Understanding Activity and User Experience in Ubiquitous Environments. 2009. VTT Technical
Research Centre of Finland.
20. Vermeeren, A.P., et al. User experience evaluation methods: current state and development needs. in
Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries.
2010. ACM.
21. Health Care Information and Management Systems Society, H., Selecting a Mobile App: Evaluating the
Usability of Medical Applications. 2012.
22. Naumann, F. and C. Rolker, Assessment methods for information quality criteria. 2000.
23. Cummings, E., E. Borycki, and E. Roehrer, Issues and considerations for healthcare consumers using
mobile applications. Studies in Health Technology and Informatics, 2013. 182: p. 227-231.
24. Nielsen, J., Usability 101: Introduction to usability. 2003.
25. www.usabilitynet.org. What is usability? http://www.usabilitynet.org/management/b_what.htm.
Appendix 2: Mobile Application Rating Scale (MARS) Training
Watch the MARS Training video modules online: http://bit.ly/marstraining
Appendix 3: Mobile Application Rating Scale
Instructions for use: The app should be thoroughly trialled for at least 10 minutes.
Determine: 1) how easy it is to use; 2) how well it functions; 3) does it do what it purports to do?
Review: settings, developer information, external links, security features, etc.
Visit the app store, review the app description and identify information on the purpose, functionality, affiliations of
the app developers, how the app was developed and if it has been tested.
Conduct an online search using the app name to identify up-to-date information on the app. Google scholar may
also need to be searched for evidence base on the app in the scientific literature.

App Classification
The Classification section is used to collect descriptive and technical information about the app. Please review the
app description in iTunes / Google Play to access this information.
App Name: _______________________________________________________________________________
Rating this version: ___________________________ Rating all versions: ___________________________
Developer: _______________________________________________________________________________
N ratings this version: _________________________ N ratings all versions: _________________________
Version: _____________________________________ Last update: _________________________________
Cost - basic version:___________________________ Cost - upgrade version: _______________________
Platform: ! iPhone ! iPad ! Android
Brief description: __________________________________________________________________________
__________________________________________________________________________
Focus: what the app targets Theoretical background/Strategies
(select all that apply) (all that apply)
! Increase Happiness/Well-being ! Assessment
! Mindfulness/Meditation/Relaxation ! Feedback
! Reduce negative emotions ! Information/Education
! Depression ! Monitoring/Tracking
! Anxiety/Stress ! Goal setting
! Anger ! Advice /Tips /Strategies /Skills training
! Behaviour Change ! CBT - Behavioural (positive events)
! Alcohol /Substance Use ! CBT – Cognitive (thought challenging)
! Goal Setting ! ACT - Acceptance commitment therapy
! Entertainment ! Mindfulness/Meditation
! Relationships ! Relaxation
! Physical health ! Gratitude
! Other _______________________________ ! Strengths based
! Other ____________________________
Affiliations:
! Unknown ! Commercial ! Government ! NGO ! University
Age group (all that apply)

! Children (under 12)
! Adolescents (13-17)
! Young Adults (18-25)
! Adults
! General
Technical aspects of app (all that apply)

! Allows sharing (Facebook, Twitter, etc.)
! Has an app community
! Allows password-protection
! Requires login
! Sends reminders
! Needs web access to function
App Quality Ratings
The Rating scale assesses app quality on four dimensions. All items are rated on a 5-point
scale from “1.Inadequate” to “5.Excellent”. Circle the number that most accurately
represents the quality of the app component you are rating. Please use the descriptors
provided for each response category.
SECTION A
Engagement – fun, interesting, customisable, interactive (e.g. sends alerts, messages, reminders,
feedback, enables sharing), well-targeted to audience
1. Entertainment: Is the app fun/entertaining to use? Does it use any strategies to increase
engagement through entertainment (e.g. through gamification)?
1 Dull, not fun or entertaining at all
2 Mostly boring
3 OK, fun enough to entertain user for a brief time (< 5 minutes)
4 Moderately fun and entertaining, would entertain user for some time (5-10 minutes total)
5 Highly entertaining and fun, would stimulate repeat use
2. Interest: Is the app interesting to use? Does it use any strategies to increase engagement by
presenting its content in an interesting way?
1 Not interesting at all
2 Mostly uninteresting
3 OK, neither interesting nor uninteresting; would engage user for a brief time (< 5 minutes)
4 Moderately interesting; would engage user for some time (5-10 minutes total)
5 Very interesting, would engage user in repeat use
3. Customisation: Does it provide/retain all necessary settings/preferences for apps features (e.g.
sound, content, notifications, etc.)?
1 Does not allow any customisation or requires setting to be input every time
2 Allows insufficient customisation limiting functions
3 Allows basic customisation to function adequately
4 Allows numerous options for customisation
5 Allows complete tailoring to the individual’s characteristics/preferences, retains all settings
4. Interactivity: Does it allow user input, provide feedback, contain prompts (reminders, sharing
options, notifications, etc.)? Note: these functions need to be customisable and not
overwhelming in order to be perfect.
1 No interactive features and/or no response to user interaction
2 Insufficient interactivity, or feedback, or user input options, limiting functions
3 Basic interactive features to function adequately
4 Offers a variety of interactive features/feedback/user input options
5 Very high level of responsiveness through interactive features/feedback/user input options
5. Target group: Is the app content (visual information, language, design) appropriate for your
target audience?
1 Completely inappropriate/unclear/confusing
2 Mostly inappropriate/unclear/confusing
3 Acceptable but not targeted. May be inappropriate/unclear/confusing
4 Well-targeted, with negligible issues
21
5 Perfectly targeted, no issues found
A. Engagement mean score = __________
SECTION B
Functionality – app functioning, easy to learn, navigation, flow logic,

and gestural design of app
6. Performance: How accurately/fast do the app features (functions) and components
(buttons/menus) work?
1 App is broken; no/insufficient/inaccurate response (e.g. crashes/bugs/broken features, etc.)
2 Some functions work, but lagging or contains major technical problems
3 App works overall. Some technical problems need fixing/Slow at times
4 Mostly functional with minor/negligible problems
5 Perfect/timely response; no technical bugs found/contains a ‘loading time left’ indicator
7. Ease of use: How easy is it to learn how to use the app; how clear are the menu labels/icons and
instructions?
1 No/limited instructions; menu labels/icons are confusing; complicated
2 Useable after a lot of time/effort
3 Useable after some time/effort
4 Easy to learn how to use the app (or has clear instructions)
5 Able to use app immediately; intuitive; simple
8. Navigation: Is moving between screens logical/accurate/appropriate/ uninterrupted; are all

necessary screen links present?
1 Different sections within the app seem logically disconnected and random/confusing/navigation
is difficult
2 Usable after a lot of time/effort
3 Usable after some time/effort
4 Easy to use or missing a negligible link
5 Perfectly logical, easy, clear and intuitive screen flow throughout, or offers shortcuts
9. Gestural design: Are interactions (taps/swipes/pinches/scrolls) consistent and intuitive across

all components/screens?
1 Completely inconsistent/confusing
2 Often inconsistent/confusing
3 OK with some inconsistencies/confusing elements
4 Mostly consistent/intuitive with negligible problems
5 Perfectly consistent and intuitive
B. Functionality mean score = _______________
SECTION C
Aesthetics – graphic design, overall visual appeal, colour scheme, and stylistic consistency
10. Layout: Is arrangement and size of buttons/icons/menus/content on the screen appropriate or
zoomable if needed?
1 Very bad design, cluttered, some options impossible to select/locate/see/read device display
not optimised
2 Bad design, random, unclear, some options difficult to select/locate/see/read

3 Satisfactory, few problems with selecting/locating/seeing/reading items or with minor screen-
size problems
4 Mostly clear, able to select/locate/see/read items
5 Professional, simple, clear, orderly, logically organised, device display optimised. Every design
component has a purpose
11. Graphics: How high is the quality/resolution of graphics used for buttons/icons/menus/content?
1 Graphics appear amateur, very poor visual design - disproportionate, completely stylistically
inconsistent
2 Low quality/low resolution graphics; low quality visual design – disproportionate, stylistically
inconsistent
3 Moderate quality graphics and visual design (generally consistent in style)
4 High quality/resolution graphics and visual design – mostly proportionate, stylistically consistent
5 Very high quality/resolution graphics and visual design - proportionate, stylistically consistent
throughout
12. Visual appeal: How good does the app look?

1 No visual appeal, unpleasant to look at, poorly designed, clashing/mismatched colours
2 Little visual appeal – poorly designed, bad use of colour, visually boring
3 Some visual appeal – average, neither pleasant, nor unpleasant
4 High level of visual appeal – seamless graphics – consistent and professionally designed
5 As above + very attractive, memorable, stands out; use of colour enhances app features/menus
C. Aesthetics mean score = _________________
SECTION D
Information – Contains high quality information (e.g. text, feedback, measures, references) from a
credible source. Select N/A if the app component is irrelevant.
13. Accuracy of app description (in app store): Does app contain what is described?
1 Misleading. App does not contain the described components/functions. Or has no description
2 Inaccurate. App contains very few of the described components/functions
3 OK. App contains some of the described components/functions
4 Accurate. App contains most of the described components/functions
5 Highly accurate description of the app components/functions
14. Goals: Does app have specific, measurable and achievable goals (specified in app store
description or within the app itself)?
N/A Description does not list goals, or app goals are irrelevant to research goal (e.g. using a game
for educational purposes)
1 App has no chance of achieving its stated goals
2 Description lists some goals, but app has very little chance of achieving them
3 OK. App has clear goals, which may be achievable.
4 App has clearly specified goals, which are measurable and achievable
5 App has specific and measurable goals, which are highly likely to be achieved
15. Quality of information: Is app content correct, well written, and relevant to the goal/topic of the
app?
N/A There is no information within the app
1 Irrelevant/inappropriate/incoherent/incorrect
2 Poor. Barely relevant/appropriate/coherent/may be incorrect
3 Moderately relevant/appropriate/coherent/and appears correct
4 Relevant/appropriate/coherent/correct

5 Highly relevant, appropriate, coherent, and correct
16. Quantity of information: Is the extent coverage within the scope of the app; and comprehensive
but concise?
1 Minimal or overwhelming
2 Insufficient or possibly overwhelming
3 OK but not comprehensive or concise
4 Offers a broad range of information, has some gaps or unnecessary detail; or has no links to
more information and resources
5 Comprehensive and concise; contains links to more information and resources
17. Visual information: Is visual explanation of concepts – through charts/graphs/images/videos, etc.

– clear, logical, correct?
N/A There is no visual information within the app (e.g. it only contains audio, or text)
1 Completely unclear/confusing/wrong or necessary but missing
2 Mostly unclear/confusing/wrong
3 OK but often unclear/confusing/wrong
4 Mostly clear/logical/correct with negligible issues
5 Perfectly clear/logical/correct
18. Credibility: Does the app come from a legitimate source (specified in app store description or
within the app itself)?
1 Source identified but legitimacy/trustworthiness of source is questionable (e.g. commercial
business with vested interest)
2 Appears to come from a legitimate source, but it cannot be verified (e.g. has no webpage)
3 Developed by small NGO/institution (hospital/centre, etc.) /specialised commercial business,
funding body
4 Developed by government, university or as above but larger in scale
5 Developed using nationally competitive government or research funding (e.g. Australian
Research Council, NHMRC)
19. Evidence base: Has the app been trialled/tested; must be verified by evidence (in published
scientific literature)?
N/A The app has not been trialled/tested
1 The evidence suggests the app does not work
2 App has been trialled (e.g., acceptability, usability, satisfaction ratings) and has partially positive
outcomes in studies that are not randomised controlled trials (RCTs), or there is little or no
contradictory evidence.
3 App has been trialled (e.g., acceptability, usability, satisfaction ratings) and has positive
outcomes in studies that are not RCTs, and there is no contradictory evidence.
4 App has been trialled and outcome tested in 1-2 RCTs indicating positive results
5 App has been trialled and outcome tested in > 3 high quality RCTs indicating positive results
D. Information mean score = ________________ *
* Exclude questions rated as “N/A” from the mean score calculation.

App subjective quality
SECTION E
20. Would you recommend this app to people who might benefit from it?
1 Not at all I would not recommend this app to anyone
2 There are very few people I would recommend this app to
3 Maybe There are several people whom I would recommend it to
4 There are many people I would recommend this app to
5 Definitely I would recommend this app to everyone
21. How many times do you think you would use this app in the next 12 months if it was relevant to
you?
1 None
2 1-2
3 3-10
4 10-50
5 >50
22. Would you pay for this app?

1 No
3 Maybe
5 Yes
23. What is your overall star rating of the app?

1 " One of the worst apps I’ve used
2 ""
3 """ Average
4 """"
5 """"" One of the best apps I've used
Scoring
App quality scores for

SECTION
A: Engagement Mean Score = __________________________
B: Functionality Mean Score = __________________________
C: Aesthetics Mean Score = __________________________
D: Information Mean Score = ___________________________
App quality mean Score = __________________________
App subjective quality Score = _______________________
App-specific
These added items can be adjusted and used to assess the perceived impact of the app on the user’s knowledge,
attitudes, intentions to change as well as the likelihood of actual change in the target health behaviour.

SECTION F
1. Awareness: This app is likely to increase awareness of the importance of addressing [insert
target health behaviour]
Strongly disagree Strongly Agree

1 2 3 4 5
2. Knowledge: This app is likely to increase knowledge/understanding of [insert target health

behaviour]

1 2 3 4 5
3. Attitudes: This app is likely to change attitudes toward improving [insert target health
behaviour]

1 2 3 4 5
4. Intention to change: This app is likely to increase intentions/motivation to address [insert

target health behaviour]

1 2 3 4 5
5. Help seeking: Use of this app is likely to encourage further help seeking for [insert target
health behaviour] (if it’s required)

1 2 3 4 5
6. Behaviour change: Use of this app is likely increase/decrease [insert target health behaviour]

1 2 3 4 5

Appendix 4: Mobile Application Rating Scale – MARS ‘App User’ version.
Instructions for use:
Raters should:
1. Use the app and trial it thoroughly for at least 10 minutes;

2. Determine how easy it is to use, how well it functions and does it do what it purports to do;
3. Review app settings, developer information, external links, security features, etc.
Scoring
SECTION
A: Engagement Mean Score = __________________________
B: Functionality Mean Score = __________________________
C: Aesthetics Mean Score = __________________________
D: Information Mean Score* = _________________________
* Exclude questions rated as “N/A” from the mean score calculation.

App quality mean score = A + B + C + D / 4 ____________
The App subjective quality scale can be reported as individual items or as a mean score, depending on the aims
of the research.
The App-specific items can be adjusted and used to obtain information on the perceived impact of the app on the
user’s knowledge, attitudes and intentions related to the target health behaviour.

‘App User’ Version
App Name: ___________________________________________________________
Circle the number that most accurately represents the quality of the app you are rating. All items are rated on a 5-
point scale from “1.Inadequate” to “5.Excellent”. Select N/A if the app component is irrelevant.
App Quality Ratings
SECTION A
Engagement – fun, interesting, customisable, interactive, has prompts (e.g. sends alerts, messages,
reminders, feedback, enables sharing)
1. Entertainment: Is the app fun/entertaining to use? Does it have components that make it
more fun than other similar apps?
1 Dull, not fun or entertaining at all
2 Mostly boring
3 OK, fun enough to entertain user for a brief time (< 5 minutes)
4 Moderately fun and entertaining, would entertain user for some time (5-10 minutes total)
5 Highly entertaining and fun, would stimulate repeat use
2. Interest: Is the app interesting to use? Does it present its information in an interesting way
compared to other similar apps?
1 Not interesting at all
2 Mostly uninteresting
3 OK, neither interesting nor uninteresting; would engage user for a brief time (< 5 minutes)
4 Moderately interesting; would engage user for some time (5-10 minutes total)
5 Very interesting, would engage user in repeat use
3. Customisation: Does it allow you to customise the settings and preferences that you would
like to (e.g. sound, content and notifications)?
1 Does not allow any customisation or requires setting to be input every time
2 Allows little customisation and that limits app’s functions
3 Basic customisation to function adequately
4 Allows numerous options for customisation
5 Allows complete tailoring the user’s characteristics/preferences, remembers all settings
4. Interactivity: Does it allow user input, provide feedback, contain prompts (reminders,
sharing options, notifications, etc.)?
1 No interactive features and/or no response to user input
2 Some, but not enough interactive features which limits app’s functions
3 Basic interactive features to function adequately
4 Offers a variety of interactive features, feedback and user input options
5 Very high level of responsiveness through interactive features, feedback and user input options
5. Target group: Is the app content (visuals, language, design) appropriate for your target
audience?
1 Completely inappropriate, unclear or confusing
2 Mostly inappropriate, unclear or confusing
3 Acceptable but not specifically designed for young people. May be
inappropriate/unclear/confusing at times
4 Designed for young people, with minor issues

5 Designed specifically for young people, no issues found
SECTION B
Functionality – app functioning, easy to learn, navigation, flow logic,

and gestural design of app
6. Performance: How accurately/fast do the app features (functions) and components
(buttons/menus) work?
1 App is broken; no/insufficient/inaccurate response (e.g. crashes/bugs/broken features, etc.)
2 Some functions work, but lagging or contains major technical problems
3 App works overall. Some technical problems need fixing, or is slow at times
4 Mostly functional with minor/negligible problems
5 Perfect/timely response; no technical bugs found, or contains a ‘loading time left’ indicator (if
relevant)
7. Ease of use: How easy is it to learn how to use the app; how clear are the menu labels, icons
and instructions?
1 No/limited instructions; menu labels, icons are confusing; complicated
2 Takes a lot of time or effort
3 Takes some time or effort
4 Easy to learn (or has clear instructions)
5 Able to use app immediately; intuitive; simple (no instructions needed)
8. Navigation: Does moving between screens make sense; Does app have all necessary links
between screens?
1 No logical connection between screens at all /navigation is difficult
2 Understandable after a lot of time/effort
3 Understandable after some time/effort
4 Easy to understand/navigate
5 Perfectly logical, easy, clear and intuitive screen flow throughout, and/or has shortcuts
9. Gestural design: Do taps/swipes/pinches/scrolls make sense? Are they consistent across all
components/screens?
1 Completely inconsistent/confusing
2 Often inconsistent/confusing
3 OK with some inconsistencies/confusing elements
4 Mostly consistent/intuitive with negligible problems
5 Perfectly consistent and intuitive
SECTION C
Aesthetics – graphic design, overall visual appeal, colour scheme, and stylistic consistency
10. Layout: Is arrangement and size of buttons, icons, menus and content on the screen
appropriate?
1 Very bad design, cluttered, some options impossible to select, locate, see or read
2 Bad design, random, unclear, some options difficult to select/locate/see/read
3 Satisfactory, few problems with selecting/locating/seeing/reading items
4 Mostly clear, able to select/locate/see/read items
5 Professional, simple, clear, orderly, logically organised
11. Graphics: How high is the quality/resolution of graphics used for buttons, icons, menus and
content?

1 Graphics appear amateur, very poor visual design - disproportionate, stylistically inconsistent
2 Low quality/low resolution graphics; low quality visual design – disproportionate
3 Moderate quality graphics and visual design (generally consistent in style)
4 High quality/resolution graphics and visual design – mostly proportionate, consistent in style
5 Very high quality/resolution graphics and visual design - proportionate, consistent in style
throughout
12. Visual appeal: How good does the app look?

1 Ugly, unpleasant to look at, poorly designed, clashing, mismatched colours
2 Bad – poorly designed, bad use of colour, visually boring
3 OK – average, neither pleasant, nor unpleasant
4 Pleasant – seamless graphics – consistent and professionally designed
5 Beautiful – very attractive, memorable, stands out; use of colour enhances app features/menus
SECTION D
Information – Contains high quality information (e.g. text, feedback, measures, references) from a
credible source
13. Quality of information: Is app content correct, well written, and relevant to the goal/topic of
the app?
1 Irrelevant/inappropriate/incoherent/incorrect
2 Poor. Barely relevant/appropriate/coherent/may be incorrect
3 Moderately relevant/appropriate/coherent/and appears correct
4 Relevant/appropriate/coherent/correct
5 Highly relevant, appropriate, coherent, and correct
14. Quantity of information: Is the information within the app comprehensive but concise?
1 Minimal or overwhelming
2 Insufficient or possibly overwhelming
3 OK but not comprehensive or concise
4 Offers a broad range of information, has some gaps or unnecessary detail; or has no links to
more information and resources
5 Comprehensive and concise; contains links to more information and resources
15. Visual information: Is visual explanation of concepts – through charts/graphs/images/videos,

etc. – clear, logical, correct?
N/A There is no visual information within the app (e.g. it only contains audio, or text)
1 Completely unclear/confusing/wrong or necessary but missing
2 Mostly unclear/confusing/wrong
3 OK but often unclear/confusing/wrong
4 Mostly clear/logical/correct with negligible issues
5 Perfectly clear/logical/correct
16. Credibility of source: does the information within the app seem to come from a credible
source?
1 Suspicious source
2 Lacks credibility
3 Not suspicious but legitimacy of source is unclear
4 Possibly comes from a legitimate source

5 Definitely comes from a legitimate/specialised source
App subjective quality
SECTION E
17. Would you recommend this app to people who might benefit from it?
1 Not at all I would not recommend this app to anyone
2 There are very few people I would recommend this app to
3 Maybe There are several people I would recommend this app to
4 There are many people I would recommend this app to
5 Definitely I would recommend this app to everyone
18. How many times do you think you would use this app in the next 12 months if it was relevant
to you?
1 None
2 1-2
3 3-10
4 10-50
5 >50
19. Would you pay for this app?

1 No
3 Maybe
5 Yes
20. What is your overall (star) rating of the app?

1 " One of the worst apps I’ve used
2 ""
3 """ Average
4 """"
5 """"" One of the best apps I've used
App-specific
SECTION F
1. Awareness : This app has increased my awareness of the importance of addressing the health
behaviour

1 2 3 4 5
2. Knowledge - This app has increased my knowledge/understanding of the health behaviour

1 2 3 4 5
3. Attitudes – The app has changed by attitudes toward improving this health behaviour

1 2 3 4 5
4. Intention to change – The app has increased my intentions/motivation to address this health
behaviour

1 2 3 4 5
5. Help seeking: This app would encourage me to seek further help to address the health
behaviour (if I needed it)

1 2 3 4 5
6. Behaviour change: Use of this app will increase/decrease the health behaviour

1 2 3 4 5
Further comments about the app?
THANK YOU!

Glossary
MARS: Mobile Application Rating Scale
UX: User Experience
HiMSS: Healthcare Information and Management Systems Society
CBT: Cognitive Behavioural Therapy
mHealth: Mobile Health

Mars-Report Final 181214

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Mars-Report Final 181214

Enviado por

Direitos autorais:

Formatos disponíveis

Mobile Application

Associate Professor Leanne Hides

Young and Well CRC

Associate Professor Leanne Hides Professor David Kavanagh

Stoyan Stoyanov Oksana Zelenko

Associate Professor Dian Tjondronegoro Madhavan Mani

Creative Industries (CI), Queensland University of Technology (QUT), Brisbane, Australia

Copyright and Disclaimer

2 // Safe. Healthy. Resilient.

3 // Safe. Healthy. Resilient.

4 // Safe. Healthy. Resilient.

5 // Safe. Healthy. Resilient.

6 // Safe. Healthy. Resilient.

1.2 MARS TESTING ON MENTAL HEALTH APPS

1.3 STATISTICAL ANALYSES

1.4 DEVELOPMENT OF THE MARS ‘APP USER’ VERSION

8 // Safe. Healthy. Resilient.

Criterion category Frequency %

App classification – confidentiality, security, registration, community, affiliation 12 3%

Aesthetics – graphics, layout, visual appeal 52 15%

Engagement – entertainment, customisation, interactivity, fit to target group, etc. 66 19%

Functionality – performance, navigation, gestural design, ease of use 90 26%

Subjective quality – worth recommending, stimulates repeat use, overall rating 16 5%

2.2 MARS TESTING ON MENTAL HEALTH APPS

9 // Safe. Healthy. Resilient.

1533 apps identified

113 excluded by title

405 remaining and

10 // Safe. Healthy. Resilient.

Functionality α = 0.80, ICC = 0.50 (95% CI 0.33-0.62)

11 // Safe. Healthy. Resilient.

3.2 TRAINING AND RECOMMENDATIONS FOR USE

12 // Safe. Healthy. Resilient.

3.4 FUTURE RESEARCH

13 // Safe. Healthy. Resilient.

15 // Safe. Healthy. Resilient.

16 // Safe. Healthy. Resilient.

Nielsen J; 2003[24] Usability 101: Introduction to usability No

17 // Safe. Healthy. Resilient.

Appendix 2: Mobile Application Rating Scale (MARS) Training

Watch the MARS Training video modules online: http://bit.ly/marstraining

Appendix 3: Mobile Application Rating Scale

Mobile Application Rating Scale (MARS)

App Name: _______________________________________________________________________________

Rating this version: ___________________________ Rating all versions: ___________________________

N ratings this version: _________________________ N ratings all versions: _________________________

Version: _____________________________________ Last update: _________________________________

Cost - basic version:___________________________ Cost - upgrade version: _______________________

Platform: ! iPhone ! iPad ! Android

Brief description: __________________________________________________________________________

Age group (all that apply)

Technical aspects of app (all that apply)

Functionality – app functioning, easy to learn, navigation, flow logic,

8. Navigation: Is moving between screens logical/accurate/appropriate/ uninterrupted; are all

9. Gestural design: Are interactions (taps/swipes/pinches/scrolls) consistent and intuitive across

B. Functionality mean score = _______________

22 // Safe. Healthy. Resilient.

12. Visual appeal: How good does the app look?

C. Aesthetics mean score = _________________

23 // Safe. Healthy. Resilient.

17. Visual information: Is visual explanation of concepts – through charts/graphs/images/videos, etc.

D. Information mean score = ________________ *

* Exclude questions rated as “N/A” from the mean score calculation.

Rating this version: _ Rating all versions: _

N ratings this version: _ N ratings all versions: _

Version: _____ Last update: _

Cost - basic version:_____ Cost - upgrade version: _