|
Chapter 5
Testing Perceptions
and Financial Viability
5.1. Introduction
This chapter describes a number of expert assessments, and Cost Benefit
Analyses (CBAs) to answer the subquestion, what support can be found
for hypotheses about the viability of Multimedia Retrieval Systems (MRSs)
for Marketing & Sales (M&S)?
This chapter has two parts in which the five hypotheses, described in
the previous chapter, are tested.
The first part contains expert assessments of the value added of MM
and the relevance of success/risk factors, the viability of seven clusters
of MM teleservices, and the viability of a Multimedia Business Catalogue
(MBC) and a promotional CD-i. Thereafter, the results of a survey of potential
investors in MM teleshop services are given. Next, marketing research is
reviewed to assess the economic viability of MRSs for M&S.
The second part contains a retrospective CBA and ROI computation of
the IECT photo archive, making a case for the effectiveness and economic
viability of an MCA, and a prospective CBA and ROI computation of an MBC
for tele-ordering by Top 1000 accounts, giving an indication of the effectiveness
and economic viability of an MBC for tele-ordering.
Finally, the findings are summarised.
5.2. Test measurement, test
reliability and test validity
Since the first part of this chapter is about testing statistical hypotheses
using questionnaires for surveys and expert assessments, it is important
to look deeper into the issue of test measurement, test reliability and
test validity. A test is here used as an instrument for obtaining a sample
of opinions about MM, retrieval, success/risk factors, system effectiveness
and system viability.
Test measurements take place at the:
On a Likert-type scale, subjects usually respond to a statement by checking
either "strongly agree" (scored a "5"), "agree" (scored a "4"), "undecided"
(scored a "3"), "disagree" (scored a "2") or "strongly disagree" (scored
a "1"). An example is given below.
Scarcity of multidisciplinary expertise is a typical risk factor
for multimedia projects.
Strongly disagree 1----------2----------3----------4----------5 Strongly
agree
Depending on the question, variations are used, by offering Likert-type
scales ranging from "very positive" to "very negative" etc. Traditionally,
psychologists have assumed that Likert-type scales yield interval data,
meaning that there is an equal psychological interval between each consecutive
number. The advantage of Likert-type scales is that they yield more information
than nominal-dichotomous items. More important, Likert-type items can be
analysed by more powerful statistical tests, such as the t-test and ANOVA,
than nominal-dichotomous items (Mitchell & Jolley, 1992). Another possibility
of Likert-type scales is that answers to items measuring the same variables
can be summated. An important advantages of this is that tests with summated
scores are more reliable than one-question tests.
For these reasons, nominal-dichotomous items are used only sparingly
in the expert assessment tests. If they are used the reason is that answers
on the item are not intended to test a statistic hypothesis, or the use
of interval scales looks too artificial.
The significance levels used are
=.05 (p values of less than .05 are significant) and =.01
(p values of less than .01 are highly significant). Non significant (NS)
p values are not presented.
The known methodological weaknesses of psychological tests are the threats
to validity, "does the test measure what it purports to measure?", and
reliability, "are the test results consistent?". Experts were specifically
selected in several cases, because it is believed that they hold reasonable,
consistent, opinions, and that they are able to make better informed judgements
than a layman.
Reliability
Reliability, i.e., the repeatability of any measurement of a variable,
is extremely important. Reliability is the consistency of test results,
including the tendency of a test or measurement to produce the same results
when it measures twice some entity or attribute, believed not to have changed
in the interval between the measurements (Kidder, 1981).
Test-retest reliability is a convenient 'interpretation' of test reliability
(Kidder, 1981). Yet, there are two major problems with test-retest reliability
estimates (Allen & Yen, 1979):
Carry-over or learning effects: the first testing
may influence the second testing.
Time interval effects: long time intervals make effects due
to changes in information or moods more likely. New market information
or decisions on project budgets may influence the perceptions of the subjects
between tests. As a consequence test reliability tends to be underestimated.
To estimate the test reliability another approach is also used: split-half
test reliability. The test groups are split (odd/even) into two halves,
and the correlation between the two halves is determined. The outcomes
tend to be somewhat lower than test-retest reliability estimates of the
whole test due to the smaller subgroups. To overcome this problem the Spearman-Brown
Formula is used if the halves of the test are parallel (Allen & Yen,
1979).
If item scores need to be summated it is useful to determine Cronbach's
coefficient a (Allen & Yen, 1979)
for item homogeneity.
These three ways of estimating test reliability are used for the tests
described in following sections.
Validity
Test validity is related to the question: do you measure what you want
to measure? In this case: can we measure viability and related factors?
It can be argued that sometimes an attempt is made to measure the unmeasurable,
to predict future viability.
When performing expert assessment test one should be well aware of some
validity threats.
Test leader effects (Kidder, 1981): the biases of a test
leader influences the experts, i.e., lead to subject biases (Mitchell &
Jolley, 1992), thus influencing the outcome in the expected direction.
For example, a test leader who shows little enthusiasm will probably get
less positive responses about an MRS than a more enthusiastic test leader.
Demonstration system effects: in the case where a demonstration
is given, the quality of the demonstration influences the responses. For
example, a very good demonstration of an MRS will lead to a different perception
of its viability than a bad demonstration.
Order effects (Kidder, 1981): the order of the stimuli presented
to the expert panel may lead to unwanted interactions.
An attempt was made to reduce test leader effects by standardising the
testing procedures and demonstration system effects by giving only 'realistic'
demonstrations. Order effects are difficult to avoid completely in business
settings.
Therefore, it is important to assess the validity of the tests, apart
from more qualitative approaches with regard to content validity (e.g.,
face validity and logical validity) (Allen & Yen, 1979). Predictive
validity (Allen & Yen, 1979) is of interest, as are external
validity (Mitchell & Jolley, 1992), and concurrent validity (Allen
& Yen, 1979). Let us have a closer look at the possibility of obtaining
such viability estimates.
The predictive validity of the tests used can only be measured
by a longitudinal study: we must be able to wait 5 or 10 years and then
look back and evaluate whether the MRS for M&S were as viable as expected.
Such a longitudinal study is well beyond the scope of this research, and
moreover, after 5 or 10 years the results of a longitudinal research design,
however interesting from a methodological point of view, will no longer
hold much interest for decision makers as new types of system will have
evolved, making the results obsolete.
Another interesting issue is external validity, i.e., to what
degree can test results be generalised to other settings, subjects and
times? In my research generalisation to other times is not very relevant,
because it is assumed that the viability of MRSs for M&S varies with
time; they have a certain life cycle. Generalisation to other settings
and subjects is, however, of interest.
The concurrent validity of a test is demonstrated by a test and
criterion scores when both measurements are obtained at (about) the same
time. The concurrent validity of a test to measure the viability of an
MRS over time can be obtained, for example, by making a comparison with
market forecasts from independent market researchers (the criterion scores).
An attempt was made to estimate concurrent and external validity for
several of the test described in the following sections.
5.3. Perceptions of the value added
of MM and success/risk factors.
5.3.1. Introduction
A survey of MM projects was performed to test hypotheses about the value
added of MM, the typical risk factors for MM projects and the risk factors
for the introduction of MM products and services. With regard to the risk
factors, only a small subset of factors were selected which were believed
to be MM specific.
With regard to the value added of MM the hypotheses tested are
summarised in the table below. These hypotheses are related to the H1 hypothesis
discussed in chapter 4. Four alternative hypotheses were formulated which
apply to M&S situations in which an effective information and knowledge
transfer is needed.
| No |
-
Alternative hypothesis |
| 1-1 |
The user friendliness of an IS improves with
MM. |
| 1-2 |
The presentation of information improves by
using audio and video. |
| 1-3 |
An MM message is understood better than a textual
message. |
| 1-4 |
Service to customers improves by using MSs. |
Table 9. Overview of alternative hypotheses with regard to
the value added of MM.
A general hypotheses is formulated about identified project management
and system success/risk factors being critical for the viability of MRSs
for M&S (H4) in the previous chapter.
With regard to the risk factors for MM projects four testable
hypotheses were formulated which seem to be typical for MM projects: scarcity
of multidisciplinary expertise (4-6), high production costs (4-7), complexity
(4-5) and too little standardisation (4-8).
|
No
|
-
Alternative hypothesis |
|
4-5
|
Complexity is a typical risk factor for MM projects. |
|
4-6
|
Scarcity of multidisciplinary expertise is a
typical risk factor for MM projects. |
|
4-7
|
High production costs for audio-visual information
is a typical risk factor for MM projects. |
|
4-8
|
Too little standardisation of MM products is
a typical risk factor for MM projects. |
Table 10. Overview of alternative hypotheses with regard to
MM project management risk factors.
The hypotheses about risk factors for the introduction of MM products
and services are also related to the general success/risk factors hypothesis
H4 discussed in the previous chapter. The hypotheses about costs (4-9,
4-10 and 4-11) are related to the category of system success/risk factors
'system and usage costs'. Unstable standard (4-13) and becoming outdated
quickly (4-14) are related to the category 'technical reliability'. Too
little dissemination of use is added because it seems to be an inhibiting
factor in a market which is still in its infancy. For example, if only
a few people use certain MSs than the market for certain MR products and
services is very small. It is very difficult to reach an acceptable ROI
in such markets as a supplier of such products.
| No |
-
Alternative hypothesis |
| 4-9 |
High costs of hardware and software are a risk
factor for the introduction of MM products and services. |
| 4-10 |
High costs of using information services are
a risk factor for the introduction of MM products and services. |
| 4-11 |
High costs of telecommunication are a risk factor
for the introduction of MM products and services. |
| 4-12 |
Too little dissemination of use is a risk factor
for the introduction of MM products and services. |
| 4-13 |
Unstable standards are a risk factor for the
introduction of MM products and services. |
| 4-14 |
Products and services becoming outdated quickly
are a risk factor for the introduction of MM products and services. |
Table 11. Overview of alternative hypotheses with regard to
the introduction of MM products and services
5.3.2. Method
Perceptions or opinions about the value added and risk factors were
measured.
An n=20 survey was performed from August to December 1993 with respondents
from MM projects, mostly project leaders. Half of the respondents had participated
in commercial (n=9) and half in research projects (n=11).
To test the hypotheses the experts were confronted with a number of
statements about the value added of MM, and typical risk factors for MM
projects and for the introduction of MM products and services (see appendix
C.3.). Likert-type scales were used for the estimations that ran from 1
(Strongly disagree) to 5 (Strongly agree). The middle value 3 is 'neutral'.
An example of a question with regard to a statements is:
1.1. The user friendliness of an information system improves with
MM
Strongly disagree 1----------2----------3----------4----------5 Strongly
agree
An example of a question with regard to a risk factor is:
2. The following risk factors are typical for multimedia projects:
2.1. Complexity
Strongly disagree 1----------2----------3----------4----------5 Strongly
agree
5.3.3. Results
A summary is given of t-test results in table 12 with regard to the
hypotheses Hx-1 to Hx-14. In general, all means were in the direction (>3)
as hypothesised, although not always at a confidence level of a
=.05.
With regard to the (perceived) value added of MM all the null hypotheses
can be rejected. The null hypotheses H1-10 that the user friendliness of
an IS does not improve by MM (p<.01, t=3.53, df=19), H1-20 that the
presentation of information does not improve by usage of audio and video
(p<.01, t=4.05, df=19), H1-30 that an MM message is not understood better
than a textual message (p<.05, t=2.36, df=19), and H1-40 that the service
to customers does not improve by usage of MSs (p<.01, t=5.57, df=19)
can be rejected in favour of the respective alternative hypotheses. MM
is perceived to have value added.
|
Statement |
|
2
|
|
t
|
p<
|
|
H1-1
|
The user friendliness
of an IS improves with MM |
3.9
|
1.35
|
1.17
|
3.53
|
.01
|
|
H1-2
|
Presentation of information
improves by adding audio and video |
4.0
|
1.09
|
1.05
|
4.05
|
.01
|
|
H1-3
|
An MM message is better
understood than a textual message |
3.7
|
1.58
|
1.28
|
2.36
|
.05
|
|
H1-4
|
The service to customers
improves by using MSs |
4.1
|
0.76
|
0.86
|
5.57
|
.01
|
|
H4-5
|
Complexity is a risk
factor |
3.9
|
1.33
|
1.17
|
3.45
|
.01
|
|
H4-6
|
Scarcity of multidisciplinary
expertise |
4.0
|
1.29
|
1.15
|
3.71
|
.01
|
|
H4-7
|
High production costs
for audio-visual information |
3.4
|
1.57
|
1.28
|
1.44
|
NS
|
|
H4-8
|
Too little standardisation
of MM products |
3.7
|
1.16
|
1.09
|
2.96
|
.01
|
|
H4-9
|
Costs of hardware
and software |
3.5
|
1.14
|
1.09
|
1.83
|
.05
|
|
H4-10
|
Costs of using information
services |
3.2
|
0.69
|
0.86
|
1.10
|
NS
|
|
H4-11
|
Costs of telecommunication |
3.6
|
1.51
|
1.26
|
1.97
|
.05
|
|
H4-12
|
Too little dissemination
of the use |
3.5
|
1.17
|
1.10
|
1.81
|
.05
|
|
H4-13
|
Unstable standards |
3.1
|
1.39
|
1.21
|
0.21
|
NS
|
|
H4-14
|
Fast outdating of
products and services |
3.1
|
1.25
|
1.15
|
0.41
|
NS
|
Table 12. Overview of reactions to statements about MM by MM project
members
The experts agreed ( »
4) with the statements that the user friendliness of an IS improves by
MM ( =3.9), the presentation of
information improves when using audio and video ( =4.0),
service to customers improves by using MSs (=4.1). They tended to agree
with the statement that an MM message is better understood than a textual
one ( =3.7), although it was remarked
that this depends on the type of message and the type of audience.
Figure 37. Bar chart
showing mean reactions to statements for the value added of MM
With regard to the (perceived) risk factors for MM projects three out
of four null hypotheses can be rejected in favour of the alternative hypotheses.
As expected, the null hypotheses H4-50 that complexity is not a typical
risk factor for MM projects (p<.01, t=3.45, df=19), H4-60 that scarcity
of multidisciplinary expertise is not a typical risk factor for MM projects
(p<.01, t=3.71, df=19), and H4-80 that too little standardisation of
MM products is not a typical risk factor for MM projects (p<.01, t=5.57,
df=19) can be rejected in favour of the respective alternative hypotheses.
Only the null hypothesis H4-70 that high production costs of audio-visual
information is not a typical risk factor for MM projects (p=NS, t=1.44,
df=19) cannot be rejected, although the outcome is in the expected direction
( =3.4). High variance (1.57)
can be noted with regard to the high production costs, indicating that
there was little consensus about to what degree it is a risk factor.
The experts agreed ( »
4) that complexity ( =3.9), scarcity
of multidisciplinary expertise ( =4.0),
and tended too agree that too little standardisation of MM products ( =3.7),
are risk factors for MM projects.
Figure 38. Bar chart
showing mean reactions for typical risk factors for MM projects
With regard to the (perceived) risk factors for the introduction of
MM products and services three out of six null hypotheses can be rejected
in favour of the alternative hypotheses. The null hypotheses H4-90 that
the costs of hardware and software are not a risk factor for the introduction
of MM products and services (p<.05, t=1.83, df=18), H4-110 that the
costs of telecommunication are not a risk factor for the introduction of
MM products and services (p<.05, t=1.97, df=18), and H4-120 that too
little dissemination of the use is not a risk factor for the introduction
of MM products and services (p<.05, t=1.81, df=18) can be rejected in
favour of the respective alternative hypotheses as expected. The null hypotheses
H4-100 that the costs of using information services are not a risk factor
for the introduction of MM products and services (p=NS, t=1.10, df=18),
H4-130 that unstable standards are not a risk factor for the introduction
of MM products and services (p=NS, t=0.21, df=17), H4-140 that fast outdating
of products and services is not a risk factor for the introduction of MM
products and services (p=NS, t=0.41, df=18), cannot be rejected, since
the outcomes are only in the direction expected (respectively =3.2, =3.1
and =3.1).
The experts tended to agree ( »
3.5) that the costs of hardware and software ( =3.5),
the costs of telecommunication ( =3.6),
and too little dissemination of use ( =3.5)
are risk factors for the introduction of MM products and services.
Figure 39. Bar chart
with mean reactions with regard to typical risk factors for the introduction
of MM products and services.
Reliability
The reliability coefficient of the test was estimated using the Spearman-Brown
Formula. The reliability estimate of 0.73 is not very high. This implies
that we should be careful when interpreting the test outcomes.
| Split half correlation coefficient |
0.57
|
| Spearman-Brown coefficient |
0.73
|
Table 13. Computation of test reliability estimates using
the Spearman-Brown Formula.
It is assumed that the average test item outcomes for research respondents
( =3.7) and non-research respondents
( =3.6) stem from the same population,
that the test means are equal. A two-sample t-test shows that this assumption
need not to be rejected (p=.61, t=-0.52, df=14) indeed. (On item level:
12 of 14 items did not show significant differences between both groups;
see appendix C.3.).
5.3.4. Discussion
My expectations about the value added of MM were confirmed by the respondents,
meaning that it is also their opinion that, in general, MM has value added
(H1): MM improves the user friendliness of an IS, audio and video improves
the presentation of information, an MM message is better understood than
a textual one, and service to customers improves with the use of MM. These
opinions are probably only valid to a certain degree: we can validate such
opinions by usability testing and by effectiveness measurements.
My expectation that complexity, scarcity of multidisciplinary expertise,
and too little standardisation of MM projects are typical risk factors
for MM projects were confirmed by the respondents (H4). Less consensus
exists about the high production costs of audio-visual information, some
respondents agreed and some disagreed that this is a typical risk factor
for MM projects. An argument for this is that the costs of audio-visual
information production are relatively easy to assess, and that they form
only a limited part of the total development costs. Moreover, in many MM
projects no new audio-visual information is produced, already available
audio-visual information is re-used.
For the introduction of MM products and services the respondents believe,
as expected, that the costs of hardware and software, the costs of telecommunication
and too little dissemination of the use are risk factors.
These latter risk factors are not only typical for MM products and services.
It is interesting to note that too little standardisation is seen as
a risk factor for MM projects, but that unstable standards are seen as
'neutral' for the introduction of MM products and services. Perhaps this
is because standards are always unstable, evolving (see chapter
2), and that it depends on other qualities of products and services if
they are successful or not (e.g., the price/performance ratio).
The validity of risk factors can be further analysed by comparing successful
and unsuccessful projects, and products and service introductions.
5.4. Expert assessment
of the viability of MM teleservices
5.4.1. Introduction
A market survey on tele-applications resulted in a clustering of MM
teleservices which are believed to be viable. These clusters included a
teleshopping/telemarketing cluster (Peeters & Koenen, 1993). These
clustered MM teleservices may form applications in a future Virtual Market
(VM). They include several of the MRSs (see chapter 3) with telecommunication
extensions. For example, an extended TSA, MBC or MPS may belong to the
cluster teleshopping/telemarketing. Although the teleshop/telemarketing
cluster is particularly relevant to this research, the other clusters are
relevant as well. An on-line accessible MCA belongs to the electronic publishing/information
retrieval cluster, while several of the MRSs, e.g., the MDA or MAI, belong
clearly to the cluster office/process automation.
The question is, based on the viability hypothesis H3: do the recognised
MM clusters contain viable telecommunication applications in the short
term (0-2 years), in the medium term (2-5 years) and in the long term (5-10
years)? Since our particular attention is focused on M&S, our focus
must be on the judgements for teleshopping and telemarketing.
The alternative hypotheses can be formulated, in natural language, as
follows:
H3-11: The MM clusters are important for the telecommunication business.
H3-21: MM teleshopping/telemarketing is important for the telecommunication
business.
It can also be hypothesised that there is a strong positive relationship
between the importance of an MM cluster, or MM teleshopping/telemarketing
in particular, and time:
5.4.2. Method
In December 1993 a meeting with 18 MM experts of PTT Research, most
project leaders, was organised as part of the PTT Research MIPS project.
At the beginning and at the end of the meeting all the MM experts were
asked to fill in a questionnaire (see appendix C.1.).
To test the hypotheses the experts are asked for their importance estimates
for an MM cluster in the short term (0-2 years), in the medium term (2-5
years) and in the long term (>5 years). Interval scales are used for the
estimations that run from 1 (very unimportant) to 5 (very important). The
middle value 3 is 'neutral'. An example of a question is given below:
2. How important is the multimedia cluster teleshopping/telemarketing
for the business of PTT Telecom?
0-2 years:
Very unimportant 1----------2----------3----------4----------5 Very
important
2-5 years:
Very unimportant 1----------2----------3----------4----------5 Very
important
>5 years:
Very unimportant 1----------2----------3----------4----------5 Very
important
The use of this scale results in the following general format of the
null hypothesis and alternative hypothesis with regard to the hypotheses
H3-1 to H3-4:
It was expected that the expert estimates would be highly correlated
with time in the sense that the mean estimates for the long term (>5 years)
would be higher than the mean estimates for the medium term (2-5 years),
and that in their turn the mean estimates for the medium term would be
higher than those for the short term (0-2 years). Thus, the general format
for the H3-3 and H3-4 null hypotheses and the alternative hypotheses is:

5.4.3.
Results
As can be seen in summary table 14, no MM cluster was seen as important
for the telecommunication business on the short term (0-2 years), at a
significance level of a =.05. In the
medium term (2-5 years) almost all MM clusters estimates were significantly
above 'neutral', with the exception of estimates for Security and electronic
Figure 40. Assessment of importance of MM clusters for a telecommunication
company
publishing/information retrieval. In the long term (>5 years) all clusters
were seen as important, and all these results are significant or highly
significant. When taking the cluster means the same significance pattern
as seen for the medium term, is repeated.
The experts saw the teleshopping/telemarketing cluster as important
for the telecommunication business, but this estimation is only significant
on the middle (p<.01, t=2.82, df=17) and the long term (p<.01, t=5.36,
df=17). Thus, H3-20 that the clusters are not viable can be rejected for
the teleshopping/telemarketing cluster.
| Test 1 |
0<2 years
|
2-5 years
|
>5 years
|
Cluster
|
|
|
|
t
|
p<
|
|
|
t
|
p<
|
|
|
t
|
p<
|
|
|
t
|
p<
|
| AVT |
3.3
|
1.32
|
0.89
|
NS
|
4.1
|
0.91
|
5.04
|
.01
|
4.6
|
0.70
|
9.80
|
.01
|
4.0
|
0.85
|
4.93
|
.01
|
| Teleshopping/telemarketing |
2.9
|
1.09
|
-.44
|
NS
|
3.6
|
0.88
|
2.82
|
.01
|
4.0
|
0.81
|
5.36
|
.01
|
3.5
|
0.73
|
2.92
|
.01
|
| Infotainment |
3.1
|
1.26
|
0.19
|
NS
|
3.7
|
1.03
|
2.75
|
.01
|
4.3
|
0.96
|
5.66
|
.01
|
3.7
|
0.95
|
2.98
|
.01
|
| Security |
3.1
|
1.40
|
0.25
|
NS
|
3.3
|
0.91
|
1.16
|
NS
|
3.4
|
0.85
|
1.94
|
.05
|
3.2
|
1.00
|
1.03
|
NS
|
| Store & forward
services |
2.6
|
0.92
|
-1.8
|
NS
|
3.6
|
0.85
|
3.05
|
.01
|
4.2
|
0.81
|
6.41
|
.01
|
3.5
|
0.73
|
2.78
|
.01
|
| Electronic publishing/IR |
2.6
|
1.04
|
-1.6
|
NS
|
3.1
|
0.76
|
0.62
|
NS
|
3.5
|
0.99
|
2.15
|
.05
|
3.1
|
0.80
|
0.39
|
NS
|
| Office/process automation |
3.2
|
1.20
|
0.59
|
NS
|
3.7
|
1.19
|
2.38
|
.05
|
3.9
|
0.94
|
4.27
|
.01
|
3.6
|
2.38
|
2.38
|
.05
|
Table 14. Overview of importance
estimates of an MM cluster for the business of a telecommunication company
as assessed by MM
telecommunication experts (n=18).
To test the hypothesis (H3-10) that the experts are not positive about
the MM telecommunication clusters in general, a one-sample t-statistic
was performed on the expert means (see appendix C.1.). The result is that
the H3-10 hypothesis can be rejected: the mean expert estimate (3.51) is
significantly (p<.01, df=17, t=3.42) above 'neutral' (3).
The H3-3 and H3-4 alternative hypotheses that there is a positive relationship
between time and the level of importance are formulated above. An ANOVA
was used to test the H3-3 null hypothesis for mean item outcomes and to
test the H3-4 null hypothesis for the teleshopping/telemarketing outcomes.
The outcomes of both ANOVA tests are highly significant (p<.01, see
table 15), meaning that the null hypothesis can be rejected. Given that
no inverse relationship between 'importance' and 'time' is found, the alternative
hypotheses H3-31 and H3-41 are supported. The expert expectations for the
MM clusters, and for MM teleshopping/telemarketing in particular, increase
significantly with time.
|
|
|
df
|
F
|
p<
|
| T1 Item means |
|
2
|
16.07
|
.01
|
| T1 Teleshopping/Telemarketing |
2
|
6.78
|
.01
|
Table 15. Test 1 ANOVA outcomes for
item means and
teleshopping/telemarketing in particular.
Reliability
The test-retest reliability is a convenient 'interpretation' of test
reliability (Kidder, 1981). Thus, the same subjects were confronted a second
time with the same test on the same day (about 4 hours later). The results
are summarised in table 16 below.
| Test 2 means |
0<2 years
|
2-5 years
|
>5 years
|
Cluster
|
| AVT |
3.4
|
4.1
|
4.6
|
4.0
|
| Teleshopping/ telemarketing |
2.2
|
2.8
|
3.2
|
2.7
|
| Infotainment |
2.8
|
3.8
|
4.3
|
3.6
|
| Security |
2.6
|
2.9
|
3.0
|
2.8
|
| MM ftore & forward services |
2.5
|
3.1
|
3.5
|
3.0
|
| Electronic publishing/IR |
2.3
|
2.9
|
3.2
|
2.8
|
| Office/process automation |
3.1
|
3.6
|
4.0
|
3.6
|
Table 16. Results of the second test.
Mean importance estimates of an MM cluster for the business of a telecommunication
company as assessed by
MM telecommunication experts (n=18).
Figure 41. Scatter diagram showing
the test-retest scores for the test items.
The correlation coefficient between test and retest outcomes is 0.89
(see figure 41 and the summary table below). This high correlation gives
confidence in the reliability of the assessment method. There are two relevant
test-retest reliability problems (see section 5.2.).
A carry-over or learning effect might have been present because
the time between the test was short (about 4 hours): the subjects could
still remember the answers they gave the first time. The effects of this
are rather unpredictable: it may lead to repeating answers, or it may lead
to more differentiated answers.
Time effects were reduced by keeping the time interval short. Nevertheless,
a time interval effect did occur as a discussion of MM matters took place
in between tests!
To estimate the test reliability another approach was also used: split-half
test reliability using the Spearman-Brown Formula (see section 5.2.). The
results are given both for the first and the second test in the table below.
The conclusion is that the reliability of the test is satisfactory high,
about 0.9. The fact that the Spearman-Brown reliability estimates approach
the test-retest reliability coefficient so closely confirms the idea that
the Spearman-Brown estimate is a useful alternative when test-retest reliability
estimates cannot be computed.
| Test-retest reliability |
0.89
|
| Split-half test correlation
coefficient |
0.80
|
| Test reliability (Spearman-Brown
Formula) |
0.89
|
| Split-half retest correlation
coefficient |
0.83
|
| Retest reliability (Spearman-Brown
Formula) |
0.91
|
Table 17. Test reliability estimates.
It can thus be assumed that the experts were consistent in their estimations.
To test this assumption a t-statistic was performed on the estimates of
experts per item for the first and second test. In table 18 the outcomes
are shown indicating that there are no significant statistical differences
(p=NS, df=34, t=1.46) between the mean outcomes of test 1 (3.5) and test
2 (3.2).
Statistical differences are present, however, between the mean outcomes
for the first and second test with regard to cluster means (p<.05, df=6,
t=2.57) and, more seriously, teleshopping/telemarketing (p<.01, df=34,
t=2.94). Furthermore, the test 2 teleshopping/telemarketing outcomes even
in the long term are only slightly, not significantly, above 'neutral'
(3.2) (see appendix C.1.).
|
|
|
df
|
n
|
t
|
p<
|
| Expert means |
|
34
|
18
|
1.46
|
NS
|
| Cluster means |
|
6
|
7
|
2.57
|
.05
|
| AVT |
|
|
34
|
18
|
-0.13
|
NS
|
| Teleshopping/Telemarketing |
34
|
18
|
2.94
|
.01
|
| Infotainment |
|
34
|
18
|
0.21
|
NS
|
| Security |
|
|
34
|
18
|
1.41
|
NS
|
| MM Store & Forward
services |
34
|
18
|
1.57
|
NS
|
| Electronic Publishing |
34
|
18
|
1.05
|
NS
|
| Office & Process
Automation |
34
|
18
|
0.05
|
NS
|
Table 18. Two-tailed two-sample t-test
summary table.
Figure 42. Comparison of mean test and mean retest scores, itemised
for the MM clusters
Although, the test reliability is satisfactory, about 0.9, the significant
lower teleshopping/telemarketing estimates in the retest, which are about
'neutral' and for which thus the H3-20 hypothesis cannot be rejected, make
it difficult to interpret the results with regard to this hypothesis.
Concurrent validity
An indication of the validity of the expert ratings can be obtained
by looking at the correlation of the ratings with a concurrent criterion.
In this case, market growth estimates from OVUM (Jeffcoate et al.,
1993) were used as the concurrent criterion. The outcomes are:
a correlation coefficient of 0.99 between the expert AVT
estimates and the OVUM estimates for revenues from videoconferencing traffic
in Europe and the USA for telecommunication companies; OVUM estimates these
revenues will grow from $ 446 million in 1992 to $ 2,223 in 2000;
a correlation coefficient of 0.89 between the mean scores for
the short/middle/long term, (2.96, 3.57 and 4 respectively, and the OVUM
estimates for revenues from MM PC traffic for telecommunication companies;
OVUM estimates these revenues will grow from $ 1 million in 1992 to $ 1,295
million in 2000.
Figure 43. Comparison of highly correlated,
but significantly different, forecasts for the videoconferencing market.
We should be careful with interpreting these correlation coefficients,
because market forecasts have a limited reliability and validity. For example,
the comparable OVUM and YankeeGroup (1992) estimates for the total videoconferencing
market from 1992-1996 are highly correlated (0.92), since they both predict
ongoing growth, but at a very different growth rate and therefore the forecasts
of The YankeeGroup are significantly (p<0.05; t=2.36, df=4) more optimistic
than the outcomes of OVUM (see figure 43). Which one is valid?
Nevertheless, we can conclude from the high concurrent validity coefficients
that the expert judgements about the growing importance of MM for telecommunication
are in agreement with market forecasts.
5.4.4. Discussion
The null hypothesis that MM clusters are not important for the telecommunication
business can be rejected for some clusters and for all clusters in the
long term (>5 years). The hypothesis that MM teleshopping/telemarketing
is not important for the telecommunication business can also be rejected
for the mean and long term. The retest results, however, throws a shadow
over this latter result, probably due to the effects of the discussion
about MM clusters that took place in-between the test and retest.
Yet, the hypothesis that the importance of MM clusters, and MM teleshopping/telemarketing
in particular, for the telecommunication business does not increase with
time can be rejected in favour of the alternative hypothesis. These expectations
correspond with market forecasts and widespread expectations within the
telecommunications industry.
Thus, one can conclude that it is expected that the MM clusters, among
which MM teleshopping/telemarketing, are becoming important from a telecommunication
company's point of view, but that there is uncertainty about at what speed
this is going to happen and at what level of profitability.
5.5. Expert evaluation
of an MBC and the value added of MR
5.5.1. Introduction
As is shown by an evaluative study (Hoogeveen, 1993c) the paper General
Specification pages catalogue of PTT Telecom, containing a two pages description
of every product and service for the business market, has a number of quality
problems: too low topicality (63%-90%) and too low completeness (58%).
An MBC would solve at least one of these problems: low topicality.
It is then necessary to ask, will an MBC offering MR facilities be effective,
and what MM elements and retrieval facilities have value added?
On the bases of H5, about the (perceived) effectiveness of MRSs for
M&S, it is hypothesised that:
H5-11: The MBC is judged to be better than a paper catalogue.
On the basis of H1, about the value added of MM, and H2, about the value
added of retrieval, it is hypothesised that added MM elements and added
retrieval facilities will be judged positively in case of an MBC.
A number of information types were selected to be judged for the MBC:
video, speech, and colour pictures. A number of retrieval facilities were
also selected for testing: hyperlinking, search fields, graphical browsing
(i.e., backtracking with a history function), full text searches, hierarchical
indexing using menu structures, and browsing through reducible sets using
product names. The MM elements chosen are believed to offer value added
when showing product and services information. The selected retrieval facilities
are the simpler ones, that do not need much explanation for a relatively
inexperienced computer user, as are most sales people.
The general forms of the alternative hypotheses about these elements
and facilities are:
H11: Offering the MM element is judged positively by M&S experts.
H21: Offering the retrieval facility is judged positively by M&S
experts.
The idea is that these judgements on the effectiveness of an MBC and
value added of MM and retrieval give indications of the viability of an
MBC (H3).
5.5.2. Method
Between August 1993 and December 1994 an MBC demonstrator based on the
paper General Specification pages catalogue was developed as part of the
PROMISE project within PTT Research (Derksen, 1994). The MBC demonstrator
contains two main modules: a module for compiling a tailor-made catalogue,
and a presentation module to access the contents of the MBC.
foto 940930307 "diskfax"
Figure 44. Searching and presenting a product using the MBC demonstrator.
The demonstrator includes all the MM elements and retrieval facilities
mentioned above. Video is offered in a small window at the start of the
'specification pages' of a product or service. Corporate TV commercials
were digitised for this purpose. Speech is offered as part of the video
clips, and is also offered to read aloud the text of the product or service
descriptions. A colour picture is included for every product which can
be blown up to screen size if the picture is clicked on. Since the presentation
module contains the complete selection of MM elements and retrieval facilities
most of the presentation time was dedicated to showing the presentation
module.
An M&S expert panel was formed, consisting of PTT Telecom M&S
staff people to evaluate the MBC demo and test the hypotheses. 18 people
were invited for the expert panel, but only 15 people were actual available
for an evaluation session.
The session consisted of the presentation of the MBC demonstrator, a
short period of questions and answers led by the presenter, and the filling
in of a questionnaire (see appendix C.4.) by the M&S experts.
To test the main hypothesis about the value added of an MBC an interval
scale was used that runs from 1 (much worse) to 5 (much better). The middle
value 3 is 'neutral'. An example of a question is:
To test the hypotheses about the MM elements and retrieval facilities interval
scales were used that run from 1 (very negative) to 5 (very positive).
The middle value 3 is 'neutral'. An example of a question is:
The use of these scales results in the following general format for the
null hypothesis and alternative hypothesis with regard to the hypotheses
formulated in the foregoing section:
The relevant validity threats are test leader effects and demonstration
system effects (see section 5.2.).
5.5.3. Results
An MBC was judged to be better ( =3.9)
than the paper catalogue. Thus, the null hypothesis that an MBC is not
judged to be better than a paper catalogue can be rejected (p<.01, t=5.77,
df=13) (see table 19).
Figure 45. Bar chart with mean expert judgements
All the judgements on MM elements and retrieval facilities are in the
expected direction ( >3). Since
the test results (see table 19) are significant at a confidence level of
.05 (speech, full text) or .01, all the others, the null hypotheses for
these elements and retrieval facilities can be rejected in that their addition
to an MBC was not judged positively.
|
H
|
Item
|
|
2
|
|
t
|
p<
|
|
5-1
|
MM catalogue is better than paper
catalogue |
3.9 |
0.42 |
0.65 |
5.77 |
.01 |
|
1-2
|
Judgement on the addition of video |
3.8 |
0.72 |
0.85 |
3.75 |
.01 |
|
1-3
|
Judgement on the addition of speech |
3.5 |
1.09 |
1.05 |
2.17 |
.05 |
|
1-4
|
Judgement on the addition of colour
photographs |
4.5 |
0.48 |
0.69 |
9.41 |
.01 |
|
2-5
|
Judgement on hyperlinking support |
4.0 |
0.85 |
0.92 |
4.42 |
.01 |
|
2-6
|
Judgement on supporting key word
searches |
3.6 |
0.58 |
0.76 |
3.22 |
.01 |
|
2-7
|
Judgement on the possibility of
having a history function |
3.9 |
0.44 |
0.66 |
5.49 |
.01 |
|
2-8
|
Judgement on the possibility of
using full text searches |
3.6 |
0.78 |
0.88 |
2.92 |
.05 |
|
2-9
|
Judgement on the possibility of
searching by menus |
3.7 |
0.84 |
0.91 |
3.32 |
.01 |
|
2-10
|
Judgement on the possibility of
searching by browsing |
3.6 |
0.52 |
0.72 |
3.80 |
.01 |
Table 19. Summary one-sample t-test
results for the MBC demonstrator
The use of colour pictures especially was judged 'positive' to 'very
positive' ( =4.5). The use of
video was judged 'positive' ( =3.8),
probably because of imperfect projection by a transview on an overhead
projector of moving pictures. Speech was judged 'neutral' to 'positive'
( =3.5): reading text aloud was
not seen as a really valuable addition.
All search facilities were judged 'positive' (3.5< <4.5),
with hyperlinking and graphical browsing in the form of a history function
as positive extremes.
Reliability
The outcomes of the test reliability computation for all items using
the Spearman-Brown Formula, is not really convincing, a reliability of
about .5, due to a low correlation for retrieval items. The test reliability
is more satisfactory (0.78) for the MM items alone.
|
Split-half correlation
coefficient
|
Spearman-Brown
Formula
|
| All items |
0.34
|
0.50
|
| MM items only |
0.64
|
0.78
|
| retrieval items only |
0.21
|
0.34
|
Table 20. Reliability estimates using the Spearman-Brown Formula
5.5.4. Discussion
The positive judgements on the MBC as demonstrated by the MBC demonstrator
and its elements and facilities are encouraging, but it does not mean that
every paper catalogue should be replaced by an MM catalogue and that all
MM elements and retrieval facilities judged positively are always needed.
Remarks made by the respondents indicated that an MBC should contain additional
(not demonstrated) functionality for customer profiles and customer history,
print functionality, references to happy client situations to demonstrate
the use of products and services, and accessing related information systems.
If the positive expert judgement about the MBC in comparison to a paper
catalogue is a valid and representative measure, than a positive statement
about the viability of MBCs in general seems justified. Yet, a generic
judgement like this does not say much about the viability of an MBC for
a specific business situation.
5.6. Expert evaluation
of a promotional CD-i and the value added of MR
5.6.1. Introduction
Hypotheses, that are comparable to those tested with the MBC demo, were
tested using a demo on a CD-i. There is one main difference: the CD-i was
positioned as a promotional system that can be consulted occasionally by
a wide range of potential customers, whilst the MBC will be used by business
sales staff and business customers.
Nevertheless it is interesting to make comparisons between the outcomes
of both evaluations (see section 5.7.).
The main alternative hypothesis, based on hypothesis H5 about system
effectiveness, is:
H5-11: The CD-i is seen as a useful medium for M&S by M&S experts
Note that 'useful' is used as interchangeable with 'effective'. Further
it is assumed that a medium that proves to be effective for M&S in
terms of meeting business objectives is also viable.
On the basis of H1, about the value added of MM, a number of MM elements,
information types, were selected to be tested on the CD-i: video, music,
pictures, and speech. On the basis of H2, about the value added of retrieval,
a number of retrieval facilities were also selected to be tested: hyperlinking,
hierarchical indexing in the form of menu structures, browsing and search
fields. The MM elements are believed to be of great value for marketing
product information. The retrieval facilities selected are the simpler
ones, that need little explanation for a relative inexperienced computer
user to use.
The general forms of the alternative hypotheses
about these elements and facilities are:
H11: Offering the MM element is judged positively by M&S experts.
H21: Offering the retrieval facility is judged positively by M&S
experts.
The idea is that judgements on the effectiveness of a promotional CD-i
and value added of MM and retrieval give indications of the viability of
an MPS (H3).
5.6.2. Method
An MR demonstrator on CD-i was developed for PTT Telecom with the theme
'Mobile communication products' between November 1993 and January 1994.
The actual development of this CD-i title was done in collaboration with
the CD-i developer Merlin. The CD-i contains five modules: a mobile products
catalogue, a slide show with these products, news in the form of TV commercials,
an animation to explain the operation of a mobile telephone, a Greenhopper,
and an explanation of how to produce a CD-i in the form of an interactive
sheets presentation.
The CD-i title includes most of the MM elements and retrieval facilities
mentioned in the foregoing section, however, browsing was only implemented
to a limited extend, and search fields were not implemented at all because
this was beyond the available budgets as it requires a lot of programming
effort to realise this on CD-i.
To evaluate the CD-i demo and to test the hypotheses an M&S expert
panel was formed, consisting of PTT Telecom M&S staff. 18 people were
selected for the expert panel, but only 15 people were actual available
for an evaluation session.
The session consisted of a presentation of the CD-i demonstrator, a
question and answer session controlled by the presenter, and the filling
in of a questionnaire (see appendix C.5.) by the M&S experts. The CD-i
session followed on after the MBC demonstration session.
foto 940930302 "greenhopper"
Figure 46. Searching and presenting a product using the CD-i demonstrator.
For the main usefulness hypotheses (H5-1) a nominal (yes/no) scale was
used. The related question in the questionnaire was formulated as follows:
1. Do you think that the CD-i is a useful medium for marketing
& sales? yes/no
Interval scales are used for the hypotheses on the MM elements and retrieval
facilities. The scales used run from 1 (very negative) to 5 (very positive).
The middle value 3 is 'neutral'. An example of a question is:
The use of this scale results in the following general format of the hypotheses
with regard to the MM elements and retrieval facilities:
5.6.3. Results
Of the 15 M&S experts 14 judged the CD-i to be a useful medium for
M&S, 1 expert did not respond. (This is a highly significant result:
p<.01, c 2=14.071, df=1). So the null hypothesis
that the MR possibilities of the CD-i are not seen as useful for M&S
can be rejected.
The interesting M&S applications mentioned by the respondents, the
no. of responses for each category is given between brackets, are:
POI/POS in shops (9), Primafoons, Business Centres, and dealer shops,
for example, to support sales conversations;
instruction for sales people, dealers and customers (8);
demonstrations and presentations at fairs and seminars (6);
support of corporate image and PR (3);
catalogues and product information in general (2);
information transfer towards the consumer at home (1);
order intake, when using a Tele-CD-i (1);
distribution of data and visuals (1).
Figure 47. Mean experts judgements
on CD-i MM and retrieval aspects
A graphic representation of the expert judgements is shown in figure
47. Since search fields and browsing were not shown on the CD-i demo the
scale for these two items runs from not necessary (1) to necessary (5).
For the other items, the scale runs from very negative (1) to very positive
(5).
As can be seen from t-test summary table 21, the outcome is not significant
only for browsing. As a result the null hypothesis that browsing
is not seen as a necessary retrieval facility cannot be rejected. The judgement
about browsing was about 'neutral' (3.3).
Table 21 demonstrates that all the MM elements are judged significantly
above 'neutral', meaning that the respective null hypotheses that these
elements are not judged positively (less than or equal to 3) can be rejected.
Both the use of speech and video are judged 'positive' to 'very positive'
( =4.4 and =4.5
respectively). The experts report that the combined use of video and speech
holds attention, and makes the presentation persuasive ("seeing is believing").
|
H
|
|
 |
 |
 |
t |
p< |
|
1-1
|
Judgement on the addition of video |
4.5 |
0.25 |
0.50 |
12.728 |
.01 |
|
1-2
|
Judgement on the addition of speech |
4.4 |
0.52 |
0.72 |
8.067 |
.01 |
|
1-3
|
Judgement on the addition of music |
3.7 |
0.64 |
0.80 |
3.725 |
.01 |
|
1-4
|
Judgement on the addition of colour
photographs |
3.9 |
0.55 |
0.74 |
4.947 |
.01 |
|
2-5
|
Judgement on hyperlinking support |
4.0 |
0.17 |
0.41 |
9.873 |
.01 |
|
2-6
|
Judgement on the possibility for
searching by menus |
3.7 |
0.95 |
0.98 |
2.898 |
.01 |
|
2-7
|
Judgement on the necessity for searching
by browsing |
3.3 |
1.10 |
1.05 |
1.351 |
NS |
|
2-8
|
Judgement on the necessity for key
word searching |
3.6 |
0.97 |
0.99 |
2.583 |
.05 |
Table 21.
Summary table with t-test outcomes.
Colour photographs and music were judged 'positive' ( =3.9
and =3.7 respectively).
A remark made by respondents about the photographs was that they do not
look very dynamic in comparison to video and speech on CD-i. With regard
to music several experts noted that the choice of music is very complicated
because the tastes of people differ so much. Music that is too obtrusive
should be avoided.
The retrieval facility hyperlinking was judged 'positive' ( =4.0),
this outcome is highly significant (p<.01, t=9.837, df=14). The use
of menu structures was also judged 'positive' ( =3.7),
this outcome was also highly significant (p<.01, t=2.898, df=14). So,
the null hypotheses that hyperlinking and menu structures are not judged
'positive' (values less than or equal to 3) can be rejected. A problem
reported with hierarchical menu structures for retrieval of information
is that it may take a long time to find the right information.
As hypothesised, the experts saw the use of search fields as necessary
( =3.6), and this outcome is significant
(p<.05, t=2.583, df=14).
Reliability
What is the reliability of this expert assessment? In the previous reliability
measurement case the use of the Spearman-Brown coefficient based on the
split halves coefficient proved useful (see section 5.2.). The Spearman-Brown
coefficient is 0.70. This is a reasonable but not high reliability coefficient.
This can be explained from differences in the variance (see
in table 21): the variance of the answers for retrieval items is much larger
than for MM items. As can be seen in table 22 test reliability for MM items
only is very high (0.98), whilst test reliability for retrieval items only
is not really convincing (0.51). In other words: there was a stronger consensus
about the value added of the MM items than there was about the value added
of retrieval items!
|
Split-half correlation coefficient
|
Spearman-Brown
coefficient
|
| All items |
0.54
|
0.70
|
| MM items only |
0.96
|
0.98
|
| retrieval items only |
0.34
|
0.51
|
Table 22. Reliability estimates using
the Spearman-Brown Formula.
Generalisability (external validity)
In section 5.8. the results of demonstrating some of the same MM items
in combination with the same CD-i demonstrator to respondents outside KPN,
but within the Netherlands, are presented. Most of these respondents have
an M&S related function. The corresponding items are related to judgements
on the inclusion of video, speech, music and colour photographs in the
system. When the results on these four items are compared, a correlation
coefficient of 0.36 is found, however, when the colour photograph item
is omitted a perfect correlation of 1.0 is found! With regard to the use
of colour photographs both groups of respondents judged significantly differently
(p<.05, t=-2.20, df=32). This is probably due to the fact that the CD-i
was presented to the non-KPN respondents to show a teleshop user interface
and not to show other M&S possibilities.
5.6.4. Discussion
The null hypothesis that the promotional CD-i, on the basis of the given
M&S demonstration, is not seen as useful for M&S by M&S experts
is rejected. What does this say about the effectiveness of promotional
CD-i's? If the assumption is true that the experts give a valid and reliable
judgement, and effectiveness can be equated with 'useful', then a positive
answer can be given. In reality, however, there is a long way to go for
the CD-i medium. Its penetration is not high enough to justify the already
large investments in CD-i marketing applications for consumers. The Tele-CD-i
is not yet on the market.
The experts judgements on the value added of MM elements (H1) and retrieval
facilities (H2) were as hypothesised, although the browsing facility was
not judged significantly above 'neutral'. It seems reasonable to
assume that the facilities judged positively, especially video and speech,
contribute to the positive judgement about the CD-i as a whole as M&S
medium for use by customers. Since the variance for the retrieval items
is large and test reliability for these retrieval items is only moderate,
the test results with regard to the retrieval items should be handled with
some caution. The only exception is hyperlinking.
One possibility to improve the reliability of the test part related
to the retrieval items is to increase the sample, this would probably result
in more equilibrated sample means for the retrieval items.
If the variance in the answers related to three out of four retrieval
items was caused by differences in the level of experience with retrieval
facilities another possibility would be to include all four retrieval facilities,
also browsing lists and search fields, in the CD-i demo and to confront
the respondents more directly with these facilities by letting them play
with the CD-i demo themselves, and by performing a usability test with
potential users in a usability lab.
5.7.
Comparing judgements with regard to the promotional CD-i and the MBC
It is interesting to compare judgements with regard to the promotional
CD-i and the MBC, presented in the foregoing sections. Such a comparison
is of interest because the same group of experts was involved in both demonstration
sessions. A validity threat we should be aware of is an order effect: the
MBC session was held first followed by the CD-i session. An alternative
to rule out this effect would be to use a true experimental design like
a randomised two-group design, however, such designs are often not desirable
or feasible in business settings.
|
|
|
df
|
pooled var.
|
t
|
p<
|
| Video |
4.50
|
3.77
|
28
|
0.46
|
2.95
|
.01
|
| Speech |
4.37
|
3.57
|
28
|
0.77
|
2.49
|
.05
|
| Colour photo |
3.87
|
4.43
|
28
|
0.58
|
-2.04
|
NS
|
| Hyperlinking |
3.96
|
4.00
|
26
|
0.49
|
-0.14
|
NS
|
| Menu |
3.67
|
3.70
|
28
|
0.87
|
-0.10
|
NS
|
| Browsing |
3.33
|
3.67
|
28
|
0.79
|
-1.03
|
NS
|
| Search fields |
3.60
|
3.64
|
27
|
0.79
|
-0.13
|
NS
|
Table 23. Overview t-test results with regard to judgement differences
for the promotional CD-i and the MBC.
It is interesting to note that there is no correlation (.01) between
the mean scores in the two test conditions. A two-tailed t-test, to test
the hypothesis that the mean scores of both groups on each variable are
equal, revealed that only with regard to two MM elements statistical differences
were found. These MM elements are the inclusion of video and the inclusion
of speech. With regard to the inclusion of colour photographs clear, but
not significant, differences are found.
Video was more appreciated in a promotional CD-i than in an MBC
(t=2.95, df=28, p<.01), this result is highly significant.
Speech was significantly more appreciated in a promotional CD-i than
in an MBC (t=2.49, df=28, p<.05).
Colour photographs were more appreciated in an MBC than in a promotional
CD-i, but this difference is not statistically significant.
In the discussion with the expert panel members it became evident that
for promotional purposes on a TV based medium (the CD-i) the use of video
and speech is more appropriate than on a PC based medium used in an office
environment for the support of sales people (the MBC). Photographs are
more fit for the M&S office environment because they do not distract
so easily. On a promotional medium, however, they are too static in comparison
with video.
It is interesting to note that no clear differences between the two
test conditions were found with regard to the inclusion of retrieval facilities.
5.8.
A survey of the effectiveness of MM teleshop services
5.8.1. Introduction
The VM service is discussed in chapter 3. The viability of the VM depends
largely on the viability of the individual MM teleshop services of information
providers, who make use of the general VM service. A number of interviews
with information providers offering Videotex information services, and
potential investors in information services, were conducted to obtain an
indication of the viability of MM teleshop services for information providers.
Based on hypothesis H1, about the value added of MM for M&S in situations
where effective information and knowledge transfer is required, it is hypothesised
that an MM user interface is judged to be better than a text based interface
(H1-11), that the addition of MM elements is judged 'positive' (H1-21),
and that MM is judged to be useful for marketing and sales by an information
provider (H1-31).
With regard to hypothesis H3, about the viability of MRSs for M&S,
it is hypothesised that an MM teleshop is judged to be viable (H3-41),
and that this judgement will improve if a longer time horizon is taken
(H3-51).
Certain success/risk factors are believed to be relevant (H4). Since
the questionnaire is limited only a small selection of success/risk factors
were tested (H4-61):
the importance of the innovative image of an MM teleshop service;
the importance of the inclusion of automatic payment functionality,
related to flexible user support, this is considered to be vital for teleshop
services;
competitive advantage as an argument to develop a teleshop service.
With regard to hypothesis H5, about the (perceived) effectiveness of an
MRS for M&S, it is hypothesised that an MM teleshop service is perceived
to be effective in terms of meeting the information provider's business
objectives (H5-71). Further, it is hypothesised that this judgement will
be improved if a longer time horizon is taken (H5-81).
Finally, it is hypothesised that differences will be found between the
judgements of experts from experienced, innovative firms that already run
a teleshop service, and firms that do not have any experience with setting
up and running teleshop services (see demographic hypothesis Hd-91).
| No |
-
Alternative hypothesis |
| 1-1 |
An MM user interface is judged to be better
than a text based interface (e.g., as used in teletext, Videotex or Minitel)
by information providers. |
| 1-2 |
The addition of an MM element (video, speech,
colour pictures, music, animations) is judged positively. |
| 1-3 |
MM is judged to be useful for marketing and
sales by an information provider. |
| 3-4 |
An MM teleshop is judged to be viable by an
information provider. |
| 3-5 |
The judged longer term viability of an MM teleshop
service is higher than the judged shorter term viability of such a service. |
| 4-6 |
A success/risk factor (innovative image, inclusion
of automatic payment, competitive advantage) is relevant for an MM teleshop
service. |
| 5-7 |
An MM teleshop service is perceived to be effective
in terms of meeting the information provider's business objectives, i.e.,
extra revenues, new customers, improved margin, ROI, gaining market share,
improving quality of service. |
| 5-8 |
An MM teleshop service is perceived to be more
effective in terms of meeting the information provider's business objectives
on the long term rather than on the short term. |
| d-9 |
Judgements made by experienced and non-experienced
firms differ. |
Table 24. Overview of alternative hypotheses with regard to the
economic viability of MM teleshop services for information providers.
5.8.2. Method
A group of eleven respondents from experienced teleshop service companies
was approached. Two companies did not wish to co-operate. Next, another
group of ten respondents from comparable companies was approached. These
companies were comparable in the sense that they operate in the same mix
of branches as the experienced group of companies.
The group of respondents (n=19), approached between March and June 1994,
consisted of 6 general managers, 6 marketing managers, 3 project leaders,
2 service development staff members, 1 head of information systems, and
1 general management assistant.
The respondents were first confronted with a demonstrator giving an
impression of how an MM user interface for an MM teleshop system looks
like, during an interview session. The CD-i demonstrator, described previously,
was used for this purpose. After this demonstration respondents were asked
questions about the MM aspects to test hypotheses H1-x0. Next, respondents
were confronted with scenario 1 and questions about viability (H3-x0) and
meeting business objectives (H5-x0), and so on for scenario 2 and scenario
3. Finally, they were asked some questions about success/risk factors (H4-60).
Scenario 1 contains a description of the situation in the year 1994.
Scenario 2 contains a description of the situation in the year 1999, and
scenario a description of the situation in the year 2004 (see appendix
C.6.). Scenario's 2 and 3 are based on extrapolations of current developments,
which is always hazardous.
An example of an interval scale used for a question about the value
added of MM is given below:
M1. Do you think that this kind of user interface is worse or better
than a text based interface like Teletext, Videotex or Minitel?
Much worse 1----------2----------3----------4----------5 Much better
An example of a nominal-dichotomous question about the viability of
MM teleshop services is:
9. Do you think a multimedia teleshop service is viable for your
company today?
0 Yes 0 No
An example of a question about system effectiveness (meeting business
objectives) for scenario '1999' using again an interval scale is:
5. Do you expect to meet the ROI requirement for a multimedia teleshop
service for your company in 1999?
Certainly not 1----------2----------3----------4----------5 Certainly yes
The use of the interval scale results in the following general format
of the H1-1, H1-2, H4-6 and H5-7 null hypotheses and alternative hypotheses:
It was expected that the expert estimates would be highly correlated
with time in the sense that the mean estimates for the long term (2004)
would be higher than the mean estimates for the medium term (1999), and
that in their turn the mean estimates for the medium term would be higher
than those for the short term (1994). So, the general format for the H5-8
null hypotheses and the alternative hypotheses is:

The hypothesis H1-3 and H3-4 using a nominal-dichotomous scale (yes/no)
can be formulated as hypotheses testing goodness of fit (Kirk, 1978):
Where p is the proportion of the respondents scoring yes or no.
Hypothesis H3-5 about the positive relationship of viability expectations
(measured on a yes/no scale) with time can be formulated as hypotheses
about the equality of proportions (Kirk, 1978):
The demographic hypothesis Hd-9 with regard to different scores on interval
scales can be formulated as:
5.8.3. Results
With regard to the value added of MM it can be concluded from the results,
presented in table 26, that the H1-10 and H1-20 hypotheses can be rejected:
the addition of MM elements to the user interface of a teleshop service
was judged 'positive' (mean of mean score by respondents: =4.2)
on all variables and an MM user interface was judged to be better than
a text based interface, hence also the mean MM judgements by respondent
are significantly above 'neutral' (t=9.67, p<.01, =
0.55, df=18).
|
H
|
Evaluation MM aspects
|
df
|
|
|
t
|
p<
|
|
1-1
|
M.1. MM user interface better than
text based user interface |
18
|
4.58
|
0.69
|
9.939
|
.01
|
|
1-2
|
M.2. Judging the addition of video
negative/positive |
18
|
4.21
|
0.98
|
5.404
|
.01
|
|
1-2
|
M.3. Judging the addition of speech
negative/positive |
18
|
4.16
|
0.90
|
5.618
|
.01
|
|
1-2
|
M.4. Judging the addition of colour
pictures negative/positive |
18
|
4.47
|
0.84
|
7.636
|
.01
|
|
1-2
|
M.5. Judging the addition of music
negative/positive |
18
|
3.68
|
0.95
|
3.153
|
.01
|
|
1-2
|
M.6. Judging the addition of animations
negative/positive |
18
|
4.24
|
0.75
|
7.167
|
.01
|
|
1
|
Mean score per respondent |
18
|
4.22
|
0.55
|
9.673
|
.01
|
Table 25. Results of survey with regard to MM aspects measured
on an interval scale.
Figure 48. Judgement on the value added of MM for teleshop services.
|
H
|
Evaluation MM aspects
|
df
|
|
yes
|
no
|
|
p<
|
|
1-3
|
M.7. MM is useful
for M&S |
1
|
100%
|
19
|
0
|
19.05
|
.01
|
Table 26. Results of survey with regard
to MM aspects measured on a yes/no scale.
All respondents saw MM as useful for M&S (p<.01, c
2=19.05, df=1). This indications points in the same direction as the results
given above.
With regard to scenario '1994' the null hypothesis H5-70 cannot be rejected.
Respondents did not believe that today an MM teleshop service generates
new revenues, improves the margin, helps to reach new customers, meets
ROI requirements and helps to gain market share. Only customer service
was believed to improve if an MM teleshop service is introduced today (p<.01,
t=3.47, =1.16, df=18). If
we consider the mean scores by respondents, and thus take into account
all scores, it cannot be concluded that introducing an MM teleshop service
in 1994 is judged to have value added.
|