Software Analytics of Open Source Business Software

This paper applies software analytics to open source code. Open-source software gives both individuals and businesses the flexibility to work with different parts of available code to modify it or incorporate it into their own project. The open source software market is growing. Major companies such as AWS, Facebook, Google, IBM, Microsoft, Netflix, SAP, Cisco, Intel, and Tesla have joined the open source software community. In this study, a sample of 40 open source applications was selected. Traditional McCabe software metrics including cyclomatic and essential complexities were examined. An analytical comparison of this set of metrics and derived metrics for high risk software was utilized as a basis for addressing risk management in the adoption and integration decisions of open source software. From this comparison, refinements were added, and contemporary concepts of design and data metrics derived from cyclomatic complexity were integrated into a classification scheme for software quality. It was found that 84% of the sample open source applications were classified as moderate low risk or low risk indicating that open source software exhibits low risk characteristics. The 40 open source applications were the base data for the model resulting in a technique which is applicable to any open source code regardless of functionality, language, or size.


Introduction
Since 1983, software has undergone an evolution. The establishment of open source code has reduced the importance of custom software development in building code bases, and progressive companies have realized that open source code is a valuable business strategy to maintain profitable operation and managed growth. Consequently, open source development has evolved to complement changing application portfolio growth, and application profiles have, as their by-product, hybrid applications composed of open source code.

The Problem
With these developments, the importance of open source software grows, and the question whether to adopt or integrate it is often challenging to answer.
Managers and software developers try to justify a decision based on terms of re-

The Research Methodology
Business analytics (BA) continues its growth in the business profession. It is the methods, techniques, and data that are used by an organization to measure performance and develop actionable decisions [1]. Business analytics are made up of statistical methods that can be applied to a specific project, process, or product. Within the business analytics discipline, the domains are the study of three important areas: descriptive analysis, predictive analysis, and prescriptive analysis. Software analytics is a special domain within business analytics. In a special issue of IEEE Software in 2013, the domain of software analysis was explored. To better frame software analytics, Menzies and Zimmerman defined software analytics as "analytics on software data for managers and software engineers with the aim of empowering software development individuals and teams to gain and share insight from their data to make better decisions" [2]. The authors cited "early global models and software analytics" including McCabe's cyclomatic complexity [2].

Organization of the Paper
The previous section was a general background statement to present the topic area to the reader, and it included the objective and the methodology of the research. In the next section, there is a review of open source software growth.
Then, the primary research is presented. The sample open source applications are analyzed, and the classification scheme is defined before it is applied to the sample applications. Last, the final section contains suggestions of further research and conclusions.

Open Source Software
In 1983, Richard Stallman, a programmer from MIT, came up with the idea of "free software movement." The main motive behind this thought was to provide freedom to the programmers to study and modify source code according to their  This trend results in vulnerability to the failure of open source products and responsibility to support these products and their ecosystems [10]. Open source software is an important tool for helping business develop software rapidly and effectively.

Mccabe Metrics
Tom J. McCabe authored and published cyclomatic, essential, and actual complexities in in IEEE Transactions on Software Engineering. In 2009, his article was chosen as one of the retrospective 23 highest impact papers in computer science by ACM-SIGSOFT. In the software engineering discipline, cyclomatic complexity, v(g), is recognized as a software quality attribute. It is also associated with McCabe Structured Testing Methodology. A v(g) may range from 1 to + ∞. For unit level testing using McCabe Structured Testing Methodology, a basis set of paths equal to a testable module's v(g) is the minimum unit level of testing [11]. Research has shown that when v(g) exceeds 10, the reliability of the testable unit decreases exponentially. Essential complexity, ev(g), is a measurement of a software module's coding structure. When the coding structure violates structured programing constructs, ev(g) grows from 1 to v(g) depending on how many decision predicates violate the single entry, single exit property of structured programming. In the McCabe automated testing tool, McCabe IQ, a threshold of ev(g) ≤ 3 is set [12]. A higher ev(g) indicates that the maintainability of the software decreased [13].
Since the introduction of cyclomatic and essential complexities, the McCabe family of metrics was expanded. Table 1 contains a summary of the entire McCabe metric set and each metrics quality implications [14]. McCabe design metrics focus on software integration. Module design, design, and integration complexities measure the low level, design volume, and end-to-end integration requirements using McCabe Structured Testing Methodology. When software requires higher levels of integration effort, it is considered more difficult to qualify and risker [15].
The McCabe family of metrics has a set of data metrics, ldv(g), pgdv(g), and pdv(g). When considering design qualities such as encapsulation, separation of concerns, and data hiding, data plays a critical role. When software is positively encapsulated, exhibits positive data hiding, and behaves with positive separation of concerns, it uses high levels of local data and low levels of public data. While parameter data extends data use into other software units, it is more positive than global data because its use is explicitly coded and limited by sharing between specific software units. For quality inferences, the following summarize McCabe metrics: • Cyclomatic complexity: an increase in v(g) > 10 is negative and software units with higher v(g) are considered risker. • Essential complexity: an increase of ev(g) > 3 is negative and software units with higher v(g) are considered risker.
• Design complexity: an increase in S 0 is problematic and larger software solutions are considered risker.
• Integration complexity: an increase in S 1 is problematic and software with higher end-to-end integration requirements is considered riskier. • Module design complexity: an increase in iv(g) is problematic and software with higher low-level integration requirements is considered riskier. The size or volume of the application design S0 = ∑ iv 1 ≤ S0 ≤ +∞ Integration complexity S1 The size of the high level integration basis set of subtrees S1 = S0 -n + 1 1 ≤ S1 ≤ +∞ The number of decision predicatesin the module;the size of a basis set of paths for unit level testing # design predicates + 1; v = en + 2 where e is the # of edges and n is the number of nodes in a flowgraph Module design complexity iv, iv(g) The number of decision predicates (plus 1) that significantly impact calls to subroutines; the size of a basis set of paths for low level integration testing; iv risk is based on where the module is located; a management module should have a high iv, higher iv is riskier The v of a reduced flowgraph where design predicates that do not significantly impact calls to subroutine are logically eliminated Local data complexity ldv, ldv(g), The number of decision predicates (plus 1)that significantly impact the use of local data; the size of a basis set of paths for local data testing The v of a reduced flowgraph where design predicates that do not significantly impact the use of local data lower ldv is riskier Public global data complexity pgdv, pgdv(g), (sdvglobal data) The number of decision predicates (plus 1) that significantly impact the use of public global data; the size of a basis set of paths for public data testing The v of a reduced flowgraph where design predicates that do not significantly impact the use of public global data 0 ≤ pgdv ≤ v; higher pgdv is riskier Parameter data complexity pdv, pdv(g), (sdvparameter data) The number of decision predicates (plus 1) that significantly impact the use of parameter data; the size of a basis set of paths for parameter data testing The v of a reduced flowgraph where design predicates that do not significantly impact the use of parameter data 0 ≤ pdv ≤ v; lower pdv is riskier • Local data complexity: an increase in ldv(g) is positive and software with higher local data use is considered less risky because it exhibits better encapsulation, better separation of concerns, and better data hiding • Public global data complexity: an increase in pgdv(g) is negative and software with higher pgdv is considered riskier because it exhibits poorer encapsulated, poorer separate of concerns, and poorer data hiding.
• Parameter data complexity: an increase in pdv(g) is positive and software with higher parameter data complexity is considered less risky because it exhibits better encapsulation, better separation of concerns, and better data hiding. Table 2 contains a summary of selected McCabe metrics for the open source sample used in this study. Forty applications were parsed using McCabe IQ. For each application, the lines of code (LOC) were tabulated. Design complexity, the number of modules (n), µ v , u ev , and µ iv were also calculated for each application. This data is in the "Application" section of the table. The "Risk (v > 10)" section of the table contains the same selected McCabe metrics shown in the "Application" section but only for modules whose v > 10. To examine the magnitude of the Δv, the percentage change of risk µ v from application µ v was calculated.

Application Metrics
The forty applications account for over 4 million lines of code written in Java, C++, and C. There are almost 360,000 modules in the sample code. The application µ v , u ev , and µ iv range from 1.1 to 6.0, 1.0 to 3.2, and 1.1 to 4.3, respectively. The weighted grand means for µ v , u ev , and µ iv are 2.3, 1.4, and 2.1, respectively. There is one metric, u ev for Git, that falls in a negative range (ev > 3). µ v ranges from 1.1 to 6.0, and µ ev from 1.0 to 3.2. At the application level, the applications exhibit low risk as measured by McCabe metrics.

Risk Metrics
The high risk modules for the forty applications account for almost 8500 modules. The µ v , u ev , and µ iv for risk modules (v > 10) range from 11.0 to 46.8, 1.0 to 17.7, and 1.1 to 23.0, respectively. The weighted grand means for risk module µ v , u ev , and µ iv are 21. 4, 9.3, and 16.9, respectively. Only one metric, u ev for week-ly_planner, is in a low risk range (ev ≤ 3). The grand mean of µ v for risk modules increased by 824% from 2.3 to 21.4. The magnitude of the percentage increase in µ v ranges from 243% to 1555% indicating a measurable shift in riskiness as µ v climbs above 10. Further, as µ v increases above 10, there is a corresponding increase in higher u ev . As µ v increases above 10, the modules' structuredness and maintainability grows riskier, as u ev grew from 1.4 to 9.3. As a collective group, the sample applications' modules exhibit risk. The "level of risk" is not defined.

Comparison
An alternative view of application risk is gained by examining a cross section of applications by size. Measuring application size is an arbitrary process. For this study, size is determined by lines of code using the following groups: • Extra small: 0 ≤ LOC ≤ 25,000 • Small: 25,000 < LOC ≤ 50,000 • Medium: 50,000 < LOC ≤ 75,000 • Large: 75,000 < LOC ≤ 100,000 • Extra large: 100,000 < LOC Journal of Software Engineering and Applications  Table 3 contains a selection of extra large, large, medium, small, and extra small applications from the sample data. Axelor has 106,385 lines of code and is classified as large because it is the closest fit for the large grouping. In this table,  21 McCabe unit, design, data, and transformation metrics are shown. The % n, % S 0 , and % S 1 metrics measure the portion of modules, design complexity, and integration complexity residing in the high risk modules (v > 10). For example Metafresh, 1% of the application modules are high risk modules. This 1% contains 9% and 24% of S 0 and S 1 , respectfully, a disproportionate amount of the design and integration complexities. Total ldv, pgdv, and pdv are included so that additional quality and risk attributes can be evaluated. Earlier in this paper, encapsulation, separation of concerns, and data hiding were referenced. Recall that low use of global data and high use of local and parameter data infers lower risk. In Table  3, note the density ratios of data metrics to cyclomatic complexity (ldv/v, pgdv/v, and pdv/v). This data transformation normalizes McCabe metrics across size classification. The comparison of extra large, large, medium, small, and extra small application shows that metrics do not increase due to size as measured by LOC: • Mes, a large application, has a lower average v(g) than Tmus (23.5), a small application. • Metafresh, a extra large application, has a high local data density ratio (0.78) while Libevent, a small application, has a low local data density ratio (0.25).
• Mes, a large application, makes much greater use of public global data (0.67) than Tmux (0.11), a small application.
• The three extra large applications, Metrafresh, Adempiere, and Git, make less use of parameter data, 0.32, 0.33, and 0.27, respectively, than the two large applications, Axelor and Mes, 0.51 and 0.57, respectively.
• Adempiere, an extra large application, averages more local data use (average 16.1) than Gaussian YOLOv3 (12.4), an extra small application.
• Git, an extra large application, averages less public global data use (3.1) than Tmux (15.2), a small application.
• Axelor, a large application, averages more parameter data use (83.) than Libevent (3.7), a small application. Size does not dictate the negative or risk measures for the application. The measurement challenge is integrating multiple quality and risk factors into a simple, meaningful format. In the next section, a quality classification algorithm is introduced to support risk assessment for open source software.

Application Risk Score
During thirty years of working with McCabe & Associates clients, antidotal evidence surfaced regarding their software. Due to non-disclosure agreements (NDA), clients chose to not publicize metric analysis of their code bases. However, client software exhibited risk factors that can be applied to open source software. For example, when examining client applications, it was observed that high risk modules accounted for 0% -7.5% of the total modules. Note that in Table 3 the proportion of high risk modules to total modules ranges from 1% to 15%. This observation is incorporated into an algorithm for an application risk score. To calculate an application risk score, 9 metrics and transformations associated with the McCabe metric family shown in Table 3 are utilized. Let's highlight these nine.
1) % n: the percentage of high risk modules to total modules. 2) % S 0 : the percent of total design complexity for high risk modules to total application design complexity.
3) % S 1 : the percent of total integration complexity for high risk modules to total application integration complexity. 4) µ v : The average cyclomatic complexity for high risk modules. 5) ev density: the ratio of total essential complexity for high risk modules to total essential complexity for all application modules. 6) iv density: the ratio of total module design complexity for high risk modules to total module design complexity for all application modules. 7) ldv density: the ratio of total local data complexity for high risk modules to total local data complexity for all application modules; this ratio is subtracted from one to place density ldv in the same order of magnitude as other ratios; high density ldv is positive; high "1 -high density ldv" is negative. 8) pgdv density: the ratio of total public global data complexity for high risk modules to total public global data complexity for all application modules. 9) pdv density: the ratio of total parameter data complexity for high risk modules to total parameter data complexity for all application modules; this ratio is subtracted from one to place density pdv in the same order of magnitude as other ratios; high density pdv is positive; high "1 -high density" pdv is negative.
A weight is added when calculating the application risk score. The applicable v(g) quartile is weighted twice. This weight is justified based upon the empirical significance of v(g) and v(g) > 10 association with high risk software. As shown in Figure 1, the application risk score is calculated by assigning each of the 9 metrics into four quartiles -low risk, moderate low risk, moderate high risk, and high risk quartiles. Then, the quartiles are combined into an overall risk score for the application.
With 10 data points (recall that µ v is weighted twice) used to calculate the application risk score, its magnitude can be from 10.0 to 40.0. Using equal groups for this range, the application risk score classifications are as follows (See the source code risk score in Figure 1): • 10 < low risk ≤ 17.5 • 17.5 < moderate low risk ≤ 25.0 • 25.0 < moderate high risk ≤ 32.5 • 32.5 < high risk ≤ 40.0 Table 4 illustrates the metrics associated with the risk score and risk classification for 4 sample applications. In this table, low risk, moderate low risk, and moderate high risk are shown. Metafresh and Adempiere have moderate low risk scores (20 and 24, respectively) and risk classifications; Ofbiz has a moderate high risk score (24) and moderate high risk classification; and samples_maps has a low risk score (15) and low risk classification.  The associated risk classification for Metafresh risk score (20) is moderate low risk.

Sample Risk Classification
In Table 5, twenty-five open source applications are shown.

Further Research
Based upon the finding of this research, future research should examine the composition of the risk score. The 9 software metrics should be examined to determine if they are objective proxies for software quality and risk. A limitation of this study is also the sample size and sample collect technique. Additional open source applications should be collected, and the platforms for collection should extend to broader open source communities and vendor repositories. This research can be conducted in a different context. The risk assessment algorithm can be extended to testing potential risk differences based upon application type, application size, and programming language.

Conclusions
An examination of open source software sample showed that its quality is moderate low risk or lower. The criteria for this conducted assessment were based upon established software metrics and design characteristics. The measurement of cyclomatic complexity is the single most important feature, since it is an accepted, empirically sound software metric in the software engineering disciple for analyzing software quality. The framework for the analysis of risk assessment contains additional proxy measurements of accepted design characteristicsencapsulation, separation of concerns, and data hiding. By utilizing quantification for these design characteristics, risk assessment is conducted for its susceptibility to objective review and quantification. While the definition of software risk assessment is an important discipline task, it is also important that it be conducted independently and transparently. As in current software development methodologies, the independent variables for software quality are debated among practicing software engineers. Within the software quality framework, unit level, design level, and data metrics are embodied. In addition, the density transformation, one of the features of the model, is included as a measurement of software's risk potential by indicating which modules have the potential to be troublesome during future feature maintenance and expansion.
When risk classification was applied to the sample applications, it provided risk scores and classifications derived from unit, design, and data metrics. Analysis of the open source software sample code revealed that 84% of the application exhibited moderate low or low risk design architecture. Since open source code is readily attainable, the defined algorithm can be developed and refined across a wide domain of software functionality. Using McCabe metrics, additional open source applications can be analyzed with speed and accuracy. Using a weighted factor of cyclomatic complexity yielded a usable risk score and classification. This approach further integrates the analysis of software design and promotes the risk assessment process for management and software engineers.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.