The response propensities can be used in a direct way for estimation of the target variables directly by using the response propensities as weights. This is called response propensity weighting. The direct approach attempts to estimate the true selection probabilities by multiplying the first-order inclusion probabilities with the estimated response propensities. Bias reductions will only be successful if the available auxiliary variables are capable of explaining the response behavior. The response propensities also can be used indirectly, by forming strata of elements having the same response propensities. This is called response propensity stratification. The final estimates rely less heavily on the accuracy of the model for the response propensities.

In internet surveys, selecting a proper probability sample requires a sampling frame containing the e-mail addresses of all individuals in the population. Such sampling frames rarely exist. Actually, general-population sampling frames do not contain information about which people have internet access and which do not. Thus, one should bear in mind that people not having internet access will not respond to a internet questionnaire.

Table 2. Types of weighting adjustment methods.

Moreover, people having internet access will also not always participate. Taking these facts into account, it is evident that the ultimate group of respondents is the result of a selection process (mostly self-selected) with unknown selection probabilities.

Some studies have shown that, response propensity matching combined with response propensity stratification is a promising strategy for the adjustment of the self-selection bias in Web surveys. Research is ongoing to implement further improvements for response propensity weighting. PSA is a frequently adopted solution to improve the representativity of web panels. It should be noted that there is no guarantee that correction techniques are successful [13] . Also PSA has been suggested as an approach to adjustment for volunteer panel internet survey data. PSA attempts to decrease, if not remove, the biases arising from noncoverage, nonpro- bability sampling, and nonresponse in volunteer panel internet surveys. A few studies have examined the application of PSA for volunteer panel internet surveys [16] . PSA is used for volunteer panel internet survey by Lee, and assumed to be based on two samples:

(a) a volunteer panel survey sample (s^{w}) with n^{w} units each with a base weight of, where, and (b) reference survey sample (s^{R}) with n^{R}. Units each with a base weight of, where. Note that values may not be inverses of selection probabilities because probability sampling is not used. First, the two samples are combined into one, with units. It is calculated propensity scores from s. The propensity score of the ith unit is the likelihood of the unit participating in the volunteer panel web survey (g = 1) rather tahn the reference survey (g = 0), where, given auxiliary variables. Therefore, g in PSA applied to internet survey adjustment may be labeled as sample origin instead of treatment assignment. The adjusted weight for unit j in class c of the web sample becomes

(1)

When the base weights are equal for all units or are not available, one way use an alternative adjustment factor as follows [16] ;

(2)

Aşan & Ayhan (2013) [5] has proposed a methodology for domain weighting and adjustment procedures for free access web surveys that are based on restricted access surveys. Some basic variables can be proposed for the data adjustment, namely gender breakdown, age groups, and education groups. Within the available data sources, special adjustments are proposed for the small domains. Some basic variables can be proposed for this purpose. Adjustments can be made for age groups as well as gender breakdown as follows. Population domain sizes as, and the sample domain sizes as; the cell weighting formulation can be given as.

The raking formulation can be given as

where, the row adjustment will be

where, the column adjustment will be.

The sum and proportion of gender (i = 1, 2) and age groups () are illustrated for e-mail (E) and web (W) surveys as below:

and where (3)

and where (4)

The application of this work consists of a first stage based on a web survey by an e-mail invitation and a second stage based on a voluntary participation internet survey. The methodology is also proposed for the estimation and allocation of the population frame characteristics of adult internet users by gender and age groups. The proposed alternative methodologies is a beneficial tool for internet survey users [5] .

6. Effectiveness of Weighting Adjustment Procedures

Several of these methods are closely related to one another. For example, post-stratification, in turn, is a special case of GREG weighting. All of the methods involve adjusting the weights assigned for the survey participants to make the sample line up more closely with population figures. A final consideration differentiating the four approaches is that propensity models can only incorporate variables that are available for both the internet survey sample and calibration sample [15] .

When we examine the effectiveness of the adjustment methods in internet surveys, some example of the works of Steinmez, Tijdens & Pedraza (2009) [17] , Tourangeau, Conrad & Couper (2013) [15] , Lee (2011) [11] , appears to be important. For example, Tourangeau, Conrad & Couper (2013) [15] presented a meta-analysis of the effect of weighting on eight online panels of nonprobability samples in order to reduce bias combining from coverage and selection effects. Among different findings, they concluded that the adjustment removed at most up to three-fifths of the bias, and that a large difference across variables still existed. In other words, after weighting, the bias was reduced for some variables but at the same time it was increased for other variables. The estimates of single variables after weighting would shift up to 20 percentage points in comparison to unweighted estimates [18] .

7. Conclusions

Internet surveys already offer enormous potential for survey researchers, and this is likely only to improve with time. In spite of their popularity, the quality of Web surveys for scientific data collection is open to discussion [19] . Many internet surveys use statistical corrections in an effort to remove, or at least reduce, the effects of coverage, nonresponse and selection biases on the estimates.

The general conclusion is that when the internet survey is based on a probability sample, nonresponse bias and, to a lesser extent, coverage bias, can be reduced through judicious use of post-survey adjustment using appropriate auxiliary variables.

The challenge for the survey industry is to conduct research on the coverage, sampling, nonresponse, and measurement error properties of the various approaches to web-based data collection. There are no corresponding sampling methods for internet surveys. As a result of these sampling difficulties, many internet surveys use self-selected samples of volunteers rather than probability samples. When it is used nonprobability sampling for internet survey, it should be used adjustment procedure.

We need to learn when the restricted population of the Web does not matter, under which conditions low response rates on the Web may still yield useful information, or how to find ways to improve response rates to internet surveys.

Cite this paper

Zerrin Asan Greenacre, (2016) The Importance of Selection Bias in Internet Surveys. *Open Journal of Statistics*,**06**,397-404. doi: 10.4236/ojs.2016.63035

References

- 1. Schonlau, M., Soest, A., Kapteyn, A. and Couper, M. (2009) Selection Bias in Web Surveys and the Use of Propensity Score. Social Methods and Research, 37, 291-318.

http://dx.doi.org/10.1177/0049124108327128 - 2. Internet World Stat. www.internetworldstats.com
- 3. Web Survey Methodology. www.websm.org
- 4. Bethlelem, J. (2008) How Accurate Are Self-Selection Web Surveys? Discussion Paper, University Amsterdam.
- 5. Asan, Z. and Ayhan, H.Ö. (2013) Sampling Frame Coverage and Domain Adjustment Procedures for Internet Surveys. Quality and Quantity, 47, 3031-3042.

http://dx.doi.org/10.1007/s11135-012-9701-8 - 6. Ayhan, H.Ö. (2000) Estimators of Vital Events in Dual—Record Systems. Journal of Applied Statistics, 27, 157-169.

http://dx.doi.org/10.1080/02664760021691 - 7. Ayhan, H.Ö. (2003) Combined Weighting Procedures for Post-Survey Adjustment in Complex Sample Surveys. Bulletin of the International Statistical Institute, 60, 53-54.
- 8. Bethlelem, J. (2008) Applied Survey Methods a Statistical Perspective. John Wiley & Sons, Hoboken.
- 9. Couper, M.P. (2011) Web Survey Methodology: Interface Design, Sampling and Statistical Inference. Instituto Vasco de Estadística (EUSTAT).
- 10. Bethlelem, J. (2010) Selection Bias in Web Survey. International Statistical Review, 78, 161-188.

http://dx.doi.org/10.1111/j.1751-5823.2010.00112.x - 11. Lee, M.H. (2011) Statistical Methods for Reducing Bias in Web Surveys.

https://www.stat.sfu.ca/content/dam/sfu/stat/alumnitheses/2011/MyoungLee_ - 12. Luth, L. (2008) An Emprical Approach to Correct Self-Selection Bias of Online Panel Research. CASRO Panel Conference.

https://luthresearch.com/wp-content/uploads/2015/12/Luth_CASRO_Paper_b08.pdf - 13. Bethlehem, J. and Biffignandi, S. (2012) Handbook of Web Surveys. John Wiley & Sons, Hoboken.
- 14. Kalton, G. and Flores-Cervantes, I. (2003) Weighting Methods. Journal of Official Statistics, 19, 81-97.
- 15. Tourangeau, R., Conrad, F. and Couper, M.P. (2013) The Science of Web Surveys. Oxford University Press, Oxford.

http://dx.doi.org/10.1093/acprof:oso/9780199747047.001.0001 - 16. Lee, S. (2006) Propensity Score Adjustment as a Weghting Scheme for Volunteer Panel Web Surveys. Journal of Offical Statistics, 22, 329-349.
- 17. Steinmetz, S., Tijdens, K. and Pedrazade, P. (2009) Comparing Different Weighting Procedures for Volunteer Web Surveys. Working paper-09-76, University of Amsterdam.
- 18. Callegaro, M., Baker, R., Bethlehem, J., Göritz, A.S., Krosnick, J.A. and Lavrakas, P.L. (2014) Online Panel Research a Data Quality Perspective. John Wiley, Hoboken.

http://dx.doi.org/10.1002/9781118763520 - 19. Lee, S. (2006). An Evaluation Nonresponse and Coverage Errors in a Prerecruited Probability Web Panel Survey. Social Science Computer Review, 24\, 460-475.

http://dx.doi.org/10.1177/0894439306288085