The Quality and Reliability of Short Videos about Totally Implantable Venous Access Port on Douyin and Bilibili: A Cross-Sectional Study ()
1. Introduction
A totally implantable venous access port (TIVAP) is a central venous access device placed entirely under the skin, which is widely utilized for the administration of continuous infusional chemotherapy, long-term parenteral nutrition, and frequent blood sampling [1]. Because a TIVAP can remain in place for months to years, patient and caregiver education regarding its daily observation and intermittent maintenance is paramount. Without proper care, patients are susceptible to severe complications, including catheter-related bloodstream infections, venous thrombosis, catheter occlusion, and device fracture [2] [3]. These complications cause significant quality of life, mental, and socioeconomic burdens [4]. Consequently, ensuring high-quality public education and promoting guideline-based maintenance are fundamental priorities in public health.
Furthermore, the advent of the digital health era has transformed how the public accesses medical information. It is known that a large proportion of the adult population now consults the internet for health-related queries [5]. Short-video platforms are increasingly used for rapid, interactive health communication [6]. However, videos are not only verbal but often contain visual information, which raises health literacy and reliability concerns [7]. In this evolving information environment, Douyin (the Chinese version of TikTok) and Bilibili are emerging as key sources of health information due to their easy-to-use format, rapid spread, and high interactivity [8]. Their open creation models and restricted pre-publication scrutiny, however, are a double-edged sword. These platforms may facilitate the spread of health misinformation, which can mislead users and contribute to inappropriate health decisions or delayed care [9] [10].
A growing body of work has assessed the quality and reliability of short-video health information across specific diseases [11]-[13]. Yet evidence focusing on TIVAP remains limited. This gap matters because TIVAP management relies heavily on long-term patient education. To fill this gap, we conducted a cross-sectional analysis of TIVAP-related short videos on Bilibili and Douyin. We systematically evaluated the videos using the Global Quality Score (GQS) [14], a modified DISCERN (mDIS) instrument [15], and the JAMA benchmark [16], to comprehensively compare the differences between platforms and author types, and to explore the correlations between user engagement and video quality.
2. Ethical Considerations
This cross-sectional study involved the analysis of publicly available, non-identifiable short videos on social media platforms. The research did not involve human subjects, clinical interventions, or the collection of private personal data. Consequently, in accordance with standard ethical guidelines for internet-based research, this study was exempt from institutional review board (IRB) approval and the requirement for informed consent.
3. Methods
3.1. Search Strategy and Study Selection
We conducted a cross-sectional study to evaluate the quality and reliability of TIVAP-related short videos on Bilibili and Douyin. The systematic search was executed on November 4, 2025, using a Windows-based personal computer located in Guangzhou, China. To ensure consistency and minimize temporal bias, data collection for both platforms was completed on the same day. To reduce personalization and recommendation bias, searches were performed with newly registered accounts in private-browsing mode. The search was conducted using the standardized Chinese keyword “输液港” (“totally implantable venous access port”). Due to the high turnover of short-video content and the fact that users typically engage with the most visible results, we retrieved the top 100 ranked results from each platform based on the default search algorithm. The use of a single, highly specific technical term as the primary keyword was intended to capture the most relevant professional and patient-led educational content while maintaining a manageable sample size for high-granularity expert manual review.
3.2. Video Variables and Content Coding
Uploaders were categorized into two main types: “Professionals” and “Non-Professionals”. To ensure reproducibility, the following operational rules were applied based on platform-specific verification systems and profile descriptions:
Professionals: This category required explicit and verifiable medical credentials.
Verified Institutions: Official accounts of hospitals, clinics, or academic medical centers possessing platform-issued institutional verification badges (e.g., blue/yellow “V” badges).
Verified Clinicians and Nurses: Individual practitioners displaying real-name verification and explicit professional titles (e.g., “Attending Physician”, “Oncology Nurse”) in their biographies, typically supported by platform-specific medical creator certifications.
Non-Professionals: This category encompassed all other content creators lacking verifiable clinical credentials.
Patients and Caregivers: Accounts primarily sharing personal lived experiences, home-care routines, or disease journeys.
General Science Communicators: Accounts focused on health or medical popularization (e.g., health bloggers, biology enthusiasts) but without official medical qualifications or clinical practice backgrounds.
Ambiguous/Unverified Accounts: Accounts claiming medical knowledge but lacking official platform verification or explicit offline credentials. Operational rule: If an account’s professional status could not be definitively confirmed after reviewing their profile and content, it was strictly defaulted to the “Non-Professional” category.
Two independent researchers (clinical nurses) completed all classifications. Any discrepancies in uploader categorization were resolved through discussion to reach a consensus, or by consulting a third senior researcher.
3.3. Video Quality and Reliability Assessment
Two trained raters with clinical nursing expertise evaluated all videos independently. Inter-rater reliability for the video quality and reliability scores was assessed using the Intra-class Correlation Coefficient (ICC). Disagreements between the two raters were resolved through discussion to reach a consensus, or by consulting a third senior researcher.
GQS score: Overall educational quality was assessed using the GQS (5-point Likert scale) [14].
mDIS score: A modified DISCERN instrument was used to assess reliability (higher scores indicate greater reliability) [15].
JAMA score: Source transparency was evaluated utilizing the JAMA benchmark criteria (score 0 - 4) [16].
SUM score: A composite score calculating the overall performance of the video based on the three aforementioned metrics (mDIS, JAMA, and GQS). The score is calculated using the following weighting rule: SUM = mDIS + JAMA + GQS. The total score ranges from 4 to 10. A higher SUM score indicates greater overall educational quality, reliability, and source transparency of the video.
3.4. Statistical Analysis
Analyses were conducted using standard statistical software. Continuous variables matching a normal distribution (e.g., SUM score, JAMA score, GQS score) are presented as Mean Standard Deviation (SD) and were analyzed using the independent samples t-test. Non-normally distributed continuous variables (e.g., Likes, Comments, Favorites, Duration) are presented as Median (Q1, Q3) and compared using the Mann-Whitney U test (Z statistic). Categorical variables are shown as counts and percentages. Spearman’s rank correlation test was utilized to examine associations between quality scores and engagement metrics. Statistical significance was set at p < 0.05.
4. Results
4.1. Video Selection and Platform Characteristics
The inter-rater agreement for the uploader classification and the scoring systems was excellent. Specifically, the overall ICC for the GQS, mDIS, and JAMA scores was 0.88, indicating high consistency between the two raters. Following the initial search of 200 videos (100 from each platform), a total of 22 videos were excluded based on our predefined criteria (similar or irrelevant content). Ultimately, 178 videos were included in the final analysis (Figure 1).
Figure 1. Flow diagram of the video search and selection process on TikTok (Douyin) and Bilibili.
Among the 178 included videos, 86 videos (48.31%) were from Bilibili and 92 videos (51.69%) were from Douyin. The comparative analysis between the two platforms is detailed in Table 1 and further visualized in Figure 2. Significant differences were observed between the platforms in terms of the JAMA score, Likes, Comments, Favorites, and Duration (all p < 0.05). Specifically, videos on Douyin exhibited significantly higher engagement, with a median of 236.50 Likes, 52.00 Comments, and 77.50 Favorites, compared to 29.00 Likes, 22.00 Comments, and 0.00 Favorites on Bilibili (all Z-tests p < 0.001). The Douyin videos were also significantly longer (median 49.00s vs 14.50s, p < 0.001).
Table 1. Comparison of video quality, reliability, and engagement metrics stratified by platform and author type.
(a) Comparison by Platform |
Variables |
Total (n = 178) |
Bilibili (n = 86) |
Douyin (n = 92) |
Statistic |
p |
SUM score, Mean ± SD |
7.03 ± 1.68 |
7.05 ± 1.33 |
7.01 ± 1.96 |
t = 0.14 |
0.887 |
JAMA score, Mean ± SD |
1.38 ± 0.49 |
1.20 ± 0.40 |
1.54 ± 0.50 |
t = −5.10 |
<0.001 |
GQS score, Mean ± SD |
2.88 ± 0.83 |
2.99 ± 0.73 |
2.77 ± 0.92 |
t = 1.75 |
0.081 |
Likes, M (Q1, Q3) |
114.50 (24.25, 322.50) |
29.00 (3.25, 92.00) |
236.50 (108.75, 733.25) |
Z = −7.86 |
<0.001 |
Comments, M (Q1, Q3) |
40.00 (9.25, 117.25) |
22.00 (4.25, 81.00) |
52.00 (20.75, 154.50) |
Z = −3.48 |
<0.001 |
Favorites, M (Q1, Q3) |
10.00 (0.00, 88.00) |
0.00 (0.00, 1.00) |
77.50 (18.75, 234.25) |
Z = −10.20 |
<0.001 |
Duration, M (Q1, Q3) |
33.50 (7.00, 109.00) |
14.50 (3.25, 63.25) |
49.00 (14.00, 197.00) |
Z = −3.93 |
<0.001 |
(b) Comparison by Author Type |
Variables |
Total (n = 178) |
Non-Professional (n = 58) |
Professional (n = 120) |
Statistic |
p |
SUM score, Mean ± SD |
7.03 ± 1.68 |
5.17 ± 0.92 |
7.92 ± 1.15 |
t = −17.22 |
<0.001 |
JAMA score, Mean ± SD |
1.38 ± 0.49 |
1.00 ± 0.00 |
1.56 ± 0.50 |
t = −12.27 |
<0.001 |
GQS score, Mean ± SD |
2.88 ± 0.83 |
2.22 ± 0.82 |
3.19 ± 0.64 |
t = −8.62 |
<0.001 |
Likes, M (Q1, Q3) |
114.50 (24.25, 322.50) |
188.50 (86.00, 423.00) |
74.50 (6.00, 297.00) |
Z = −3.56 |
<0.001 |
Comments, M (Q1, Q3) |
40.00 (9.25, 117.25) |
36.50 (17.00, 99.00) |
47.00 (7.00, 143.25) |
Z = −0.02 |
0.988 |
Favorites, M (Q1, Q3) |
10.00 (0.00, 88.00) |
84.00 (9.75, 319.00) |
2.00 (0.00, 31.75) |
Z = −5.49 |
<0.001 |
Duration, M (Q1, Q3) |
33.50 (7.00, 109.00) |
31.50 (8.25, 62.50) |
35.00 (5.00, 126.50) |
Z = −0.48 |
0.628 |
t: t-test, Z: Mann-Whitney test; SD: standard deviation, M: Median, Q1: 1st Quartile, Q3: 3rd Quartile.
Figure 2. Violin plots illustrating the distribution of mDIS score, JAMA score, and GQS score across Bilibili and Douyin. (p < 0.05; p < 0.0001; ns: not significant).
However, regarding the overall content quality, the SUM score (7.01 ± 1.96 vs 7.05 ± 1.33, t = 0.14, p = 0.887) and GQS score (2.77 ± 0.92 vs 2.99 ± 0.73, t = 1.75, p = 0.081) showed no statistical significance between Douyin and Bilibili. Figure 2 demonstrates that while Douyin had a significantly higher JAMA score (p < 0.0001), Bilibili presented a statistically superior distribution in the mDIS score (p < 0.05). To further delineate the specific dimensions of video reliability, Figure 3 provides a radar chart comparing the percentage distribution across the five key criteria of the modified DISCERN (mDIS) instrument for both platforms.
4.2. Quality and Reliability by Author Type
When stratifying the 178 videos by author type, 58 (32.58%) were uploaded by Non-Professionals, and 120 (67.42%) were generated by Professionals. The independent t-tests and Mann-Whitney U tests revealed profound differences between the two groups (Table 1).
Figure 3. Radar chart comparing the percentage distribution of modified DISCERN (mDIS) sub-scores between Bilibili and Douyin. Legend: The red line represents Bilibili and the blue line represents Douyin. Each vertex (mDIS-Q1 to mDIS-Q5) corresponds to a specific reliability criterion assessed on the platforms, with the scale indicating the percentage of videos meeting each standard.
Professional videos possessed robustly higher scientific rigor, scoring significantly higher in SUM score (7.92 ± 1.15 vs 5.17 ± 0.92, t = −17.22, p < 0.001), JAMA score (1.56 ± 0.50 vs 1.00 ± 0.00, t = −12.27, p < 0.001), and GQS score (3.19 ± 0.64 vs 2.22 ± 0.82, t = −8.62, p < 0.001).
Paradoxically, despite having lower quality scores, Non-Professional videos attracted overwhelmingly more positive user feedback. The median number of Likes for Non-Professionals was 188.50 compared to only 74.50 for Professionals (Z = −3.46, p < 0.001). Similarly, Favorites were drastically higher for Non-Professionals (median 84.00 vs 2.00, Z = −5.49, p < 0.001). The differences in Comments (p = 0.988) and video Duration (p = 0.628) between the two author types showed no statistical significance.
4.3. Correlations Between Scores and Engagement Metrics
To further investigate the relationship between video popularity and educational quality, a Spearman’s correlation matrix was generated (Figure 4).
The heatmap clearly illustrates that the engagement metrics most associated with immediate user reaction—Likes and Comments—have “no significant correlation” with any of the quality metrics (SUM, GQS, JAMA, and mDIS scores; all p > 0.05, denoted by “X” in the matrix).
Conversely, Favorites and Duration exhibited weak to moderate positive correlations with the quality scores. For instance, Duration positively correlated with SUM score (r = 0.42), JAMA score (r = 0.38), and mDIS score (r = 0.33). Additionally, the intrinsic quality metrics (SUM, GQS, JAMA, mDIS) were strongly inter-correlated with one another (ranging from r = 0.49 to r = 0.82).
5. Discussion
In the current age of digital health communications, short-video platforms like Douyin and Bilibili have emerged as significant sources of TIVAP information for patients and caregivers. Proper maintenance of TIVAP requires sustained patient education and compliance; thus, the accuracy of online information directly impacts patient safety and complication rates [17] [18]. In this cross-sectional study, we comprehensively evaluated 178 videos, revealing a distinct discordance between the professional quality of the content and the resulting user engagement.
Figure 4. Spearman correlation heatmap between engagement metrics (Likes, Comments, Favorites, Duration) and quality scores (mDIS, JAMA, GQS, SUM). Boxes with an “X” indicate non-significant correlations (p > 0.05).
Our platform comparison highlights nuanced differences between Bilibili and Douyin. While Douyin possessed significantly higher engagement metrics (Likes, Comments, Favorites, and Duration) and JAMA source transparency scores, Bilibili showed slightly superior reliability scores on the mDIS scale (Figure 2). Notably, overall GQS and SUM scores showed no significant differences between platforms. This finding aligns with the fact that Douyin operates as a highly optimized recommendation system where longer, more interactive content thrives, whereas Bilibili’s ecosystem might favor slightly more focused, educational narratives. The higher JAMA score on Douyin might reflect its stringent real-name verification policy for top medical creators, which enforces greater source transparency [19].
The most striking finding of this study relates to the profound discrepancy in quality and popularity between Professional and Non-Professional authors. Consistent with previous literature [11]-[13], Professional medical authors (67.42% of our sample) provided information with significantly higher scientific rigor, reflected by their higher SUM, JAMA, and GQS scores. However, paradoxically, Non-Professional videos (often patient-generated experiences) received dramatically more Likes (median 188.50 vs 74.50) and Favorites (median 84.00 vs 2.00). This suggests that users, particularly patients, may feel a stronger emotional connection to lived experiences and relatable narratives rather than clinical guidelines. While relatable, Non-Professional content lacks the comprehensive, evidence-based approach necessary for complex TIVAP maintenance. Relying solely on these videos for health education risks dangerous home-care behaviors and delayed complication management.
To further unpack this issue, our Spearman correlation analysis (Figure 4) unequivocally demonstrates that the number of Likes and Comments has “no significant correlation” with video quality. This stark lack of correlation proves that popularity is not a valid proxy for health information reliability. The weak positive correlations of Favorites and Duration with quality scores suggest that videos which are longer and saved for future reference (Favorites) tend to have slightly better educational depth. However, the overarching conclusion remains: highly visible, widely liked videos on social media do not guarantee high medical quality.
These findings carry clear practical and policy implications for multiple stakeholders. For the public and patients, this study underscores the urgency of developing digital health literacy—the ability to critically appraise online health information [7] [20]. Users must be educated to distinguish between relatable patient experiences and verified medical advice. For healthcare professionals and institutions, a more proactive role in creating authoritative, yet engaging and emotionally resonant short-video content is necessary to compete with highly popular non-professional narratives. For platforms, our results suggest that greater algorithmic responsibility is needed to prevent the spread of suboptimal medical information [21]: incorporating quality assessment indicators into recommendation algorithms could balance popularity against scientific rigor, guiding the ecosystem toward beneficial health outcomes [22] [23].
This study has several limitations. The cross-sectional design depicts associations rather than causal effects, and the sampling describes only one time point in a dynamic recommendation ecosystem. Crucially, the engagement metrics (Likes, Comments, and Favorites) were not normalized for exposure time (i.e., the time elapsed since the video was uploaded). Because raw engagement counts naturally accumulate over time, comparing absolute numbers across older and newer videos without adjusting for video age presents a potential bias in our popularity analysis. Additionally, we relied on publicly visible engagement data (Likes, Comments, Favorites) and lacked access to platform back-end distribution metrics (e.g., impressions or total watch-time), which would have provided a more mechanistic understanding of why some videos achieve high visibility.
To our knowledge, this is one of the first studies to systematically measure and contrast the quality, reliability, and popularity of TIVAP-related short-video information on Douyin and Bilibili. Our findings reveal a distinct discordance between reach and trustworthiness, which can be utilized to educate patients, engage clinicians more effectively, and govern health-information quality on a platform-by-platform basis.
6. Conclusion
The current cross-sectional investigation demonstrates that the quality of TIVAP videos on short-video platforms is highly dependent on the author’s professional background. While Professional authors provide significantly more reliable and scientifically sound content (higher SUM, JAMA, and GQS scores), Non-Professional videos attract substantially more user engagement (Likes and Favorites). Crucially, the engagement metrics (Likes and Comments) have no significant correlation with the intrinsic quality of the videos, proving that popularity is an unreliable measure of medical accuracy. In practice, platforms ought to prioritize verified medical content in their recommendation algorithms, and clinicians must proactively improve their digital communication strategies to elevate public health literacy.
NOTES
*Corresponding author.