Learning to Support Derivation of Adaptable Products in Software Product Lines

Software product line engineering is a large scale development paradigm based on mass production. It consists in building a common platform from which a set of products can be derived. Under the constraints of continuous evolution and costs optimization, the derivation process must be able to an-swer customers’ requirements and provide adequate products in a short time without defects. However, this purpose cannot be achieved if the customer must wait for the change is implemented in the common platform. In this paper, we describe our work which proposes a framework to manage derivation of adaptable products. An adaptable product is obtained by deriving the most similar product from the common platform and changing it to support the new requirements. The aim of the framework is to give quick response to the customers need while the new requirements are being implemented in the common platform. The framework includes tools for processing natural language requirements, computing the similarity between products on the basis of their requirements, and the product adaptation effort measuring.


Introduction
The Software Product Line Engineering (SPLE) [1] is an approach that aims at creating individual software applications based on a core platform, while reducing the time-to-market and the cost of development. Many SPLE-related issues have been addressed both by researchers and practitioners, such as variability management, product derivation, reusability, etc. The focus of our work will be on products derivation in evolvable software product lines (SPL).
The classic derivation of products in software product lines consists in selecting features from the core platform and deriving a product using the existing software components. As a result, products cannot be adaptable to new requirements and cannot support new features unless the new requirements will be implemented in the level of the core platform. However, the evolution of the core platform can be costly and time consuming. Our study aims at finding a solution to the problem of deriving adaptable products while the implementation of new requirement is in progress, the objective is to provide a product that meets a customer's new requirements the earliest possible without waiting for their implementation at the platform level.
Our approach allows formalizing the representation of the existing requirements and also the existing products, the new requirements are expressed in the same way. Based on these representations, a set of algorithms are proposed to compare the new requirements with the existing ones in order to identify the most similar product. Therefore, the decision to adapt the selected product in response to the new requirements will be taken depending on the similarity level and the complexity of the adaptation task. An automated tool will be developed and verified using a case study in the mobile domain.
The remainder of the paper is organized as follows. Section 2 presents the background and the motivation. In Section 3, we define our approach. In Section 4, we describe the automated tool. Section 5 presents related works and Section 6 concludes the paper and gives the work perspectives.

Background and Motivation
In this section, we introduce the background of this study. We start by introducing the SPL engineering paradigm and the SPL evolution challenges, then we focus on the challenge of delivering products that respond to new business needs quickly until they are implemented in the SPL platform.

Software Product Lines
The software product line engineering consists on building a common platform for a set of products dedicated for a specific business domain [1]. The main advantage of the software product line engineering is the improvement of the productivity by reducing the time to market and the costs [2]. By using software product lines (SPL) customers will get products adapted to their needs and wishes at a reasonable price and a short time.
Software product line engineering encompasses two engineering processes.
The first process is the domain engineering in which the common platform is developed including the requirements, the design, the realization and testing.
The second process is application engineering in which the products of the family are derived from the common platform [2].
Evolution of software product lines is a complex task because it has two levels: the level of the products and the level of the platform, the requirements changes must be propagated in the two levels which requires a well understanding of the change and its impact. Due to the complexity of SPL evolution, much research was done to deal with SPL evolution issues, we present here some of them: Modelling Evolution [3] [4] [5]: The objective of modelling evolution is to give a set of controlled and rigorous tasks to implement the change. Hence, the evolution is separated into a set of atomic operations independent from each other performed on a set of evolvable elements.
Traceability of the change [6] [7] [8]: The traceability approaches identify and trace existing links between the platform assets and also the links between the platform assets and the assets of the derived products. The goal is to make the evolution easier by determining how to propagate a change in a set on linked elements.
Co-evolution of domain and application engineering [9] [10]: this field of research focus on how the domain and the application processes evolve together and try to find changes that were implemented in one level in order propagate them to the other level.
Post-evolution verification [4]- [10]: The verification approaches study the risk of breaking the integrity of a software product line after changing it. They aim at improving the safety of software product line evolution by providing a toolset for checking if after refinement the software product line still generates well-formed products.
Model defects analysis: the approach dealing with model defects in SPL try to find defects caused by evolution such as inconsistency [5], incompleteness [11], ambiguity [11], and duplication [12] [13]. They also propose solutions to correct these defects. The majority of them focus on the feature level.
In our previous work [3], we proposed an approach to study the co-evolution between the platform and the derived products in order to correct the previous shortcomings in the evolution impact analysis tasks. In this paper we deal with evolution otherwise, we try to give quick response to a customer new requirements by selecting the most similar product to its needs and studying the cost of adapting it to support the requirements that are not implemented in the software product line platform. On the basis of the study, this solution can be adopted in agreement with the customer while waiting for the SPL change achievement.

Requirements Similarity Analysis
The concept requirement can be defined as follow: "A condition or capability needed by a user to solve a problem or achieve an objective" [IEEE610. ].
Requirements are the basis for every system, defining what the customers need from the system and also what the system must do in order to satisfy that need. They are generally expressed in natural language (NL), the language understood by all the software stakeholders. Requirement engineering [14] includes four principal activities: elicitation, documentation, validation and negotiation, and management [15]. Evolution of software product lines is expressed through requirements. Hence, the majority of works deal with evolution in the requirements or feature levels. In front of new requirements, the SPL analysts can choose one of the following scenarios to satisfy the customer needs: • Implement the new needs at the domain level: customer must wait for the change implementation, then the desired product can be derived • Study the possibility of adapting the existing products: in this case the most similar product is selected, we mean by the most similar product the one which supports the maximum of intended requirements. The product can be adapted and delivered while the new requirements are added in the domain level. The response time is the inconvenience of the first scenario. For this reason we adopted in this paper the second scenario. In further sections, we describe our approach for similar product selection and adaptation in evolvable software product line.

Approach Description
The objective of our work is to provide the product that can satisfy customers even if the customer requirements are not completely supported by the software product line. The decision of adapting the products or implementing the modification in the platform is taken by stakeholders depending on the complexity of the modification. The solution proposed consists of a three-process framework. The first process consists in transforming the requirements into a formal presentation, the second process consists in selecting the most adequate product by comparing the new requirements with the existing products requirements, and the third process involves the measurement of the adaptation complexity. The overview of the framework is depicted in Figure 1. In what follows, we will explain in details the framework processes.

Process 1: Formalization of NL Requirements
The requirements related to a product are documented using natural language because it is the easiest way for customers to express their needs. This way of modeling does not enable the comparison of these requirements with the old ones because the software product line platform support a great number of requirements. Thus, a formal representation is necessary.
We adopt Natural Language Processing (NLP) approach to transform a textual requirement into a tree-model. NLP is a technology of computer science whose aim is to convert text in natural language into a formal representation understandable by computers, it includes information extraction and syntax and semantic processing. However, a complete semantic analysis is still a subject for research.
The main NLP tasks are: • Words extraction and text representation: aims at extracting terms (verbs, nouns, …) from a text and representing them into a vector.
• Part-Of-Speech Tagging (POS): aims at labeling each word with a unique tag that indicates its syntactic role.
• Chunking: aims at labeling segments of a sentence with syntactic constituents such as noun or verb phrase (NP or VP).
• Semantic Role Labeling (SRL): aims at giving a semantic role to a syntactic constituent of a sentence. In our work we focus on the two first tasks, semantic analysis will be done in the second process. We formalize NL requirements using NLP techniques as depicted in Figure 2.
• Sentence Detector: enables the separation of sentences by putting each sentence in a different line.
• Parser: converts the sentence into a tree that represents the syntactic structure. Each word of the sentence is marked with a POS tagger (Part of-Speech tagger) that represents the role of this word in the English grammar.

Process 2: Identification of the Most Similar Product
Finding similar products consists in comparing the new requirements with the existing ones, the product which contains similar requirement is considered as a candidate product, the decision of taking the product among the proposed solutions depends on his similarity level and adaptation complexity.
In the literature the similarity between requirements is studied using the vector space model (VSM) [16]. In VSM, a requirement is represented as a vector of identifiers or terms. This linear representation induces to many semantic analysis problems such as polysemy (same term that have different sense) and synonymy. For example we consider the two following sentences.
• S1: user must be able to collect prospects data by phone or from external databases. • S2: the prospecting process starts by gathering information by telephone or from other information sources. The two sentences S1 and S2 means the same thing, however if we consider their linear representations S1 (user, collect, prospects, data, phone, external, databases) and S2 (prospecting, process, start, gathering, information, telephone, other, information, sources). The two sentences are not similar. Hence, we adopted the hierarchical semantic representation of sentences as depicted in Figure 3.
The semantic hierarchy of terms in two tree models shows more clearly the similarity between the two sentences. In what follows we present the semantic tree model and the similarity calculation method using the tree model.

Semantic Tree Model
We define a semantic tree model as a hierarchical classification of a domain concept. In order to build such model for requirements we must have a taxonomy of the software product line business domain. A taxonomy is a set of concepts with PartOf or InstanceOf relations between them. In the software product lines domain, a taxonomy refers to a classification of domain concepts or features related with InstanceOf relations. The semantic tree of requirement is constructed on the basis of its syntax tree resulted from the requirements NLP processing and the SPL domain taxonomy. In the rest of this section we define the SPL domain taxonomy and the semantic tree model.

1) Taxonomy of SPL business domain
A taxonomy Tx is defined as a set of concepts and Instanceof/partOf relations between them ( ) , Tx Fx Rx = . In software product lines the concepts of taxonomy includes, features, business concepts, and measures. These elements are classified with instanceof/partOf relations that allow for hierarchical classification of them. The taxonomy is used in requirements NLP processing specifically in terms extraction. Also, when constructing the semantic tree to locate a concepts among the others and collect their parents.
2) Semantic tree model A semantic tree model is a graph composed of multiple nodes, each node represents a concept in its leaf and has one or more parents and one or more children. The edges of the tree depicts the relations between concepts. We consider the following representation of the Tree

• ( )
Dfr n : Distance from root function returns the distance between the node n and the root of the tree.

Similarity Measuring
At this level we consider our requirements formalized into a set of tree models. In this section we use the notation presented in the previous section in the different computations. The similarity between two semantic trees is defined as follows:

( )
, i j S n n is the similarity between a node n i in the tree T1 and a node n j in the tree T2, n and m are respectively the cardinalities of N1 and N2. Each n i is compared with all the nodes n j of N2, then the most similar n j is selected by the maximum function.
We compute the similarity between two nodes n i and n j as follows: w Sc c c w Sp P n P n w Sd Dfr n Dfr n w Sl l l S n n w w w w − − * + * We consider the following attributes of a node n i in the tree T1: c i : The concept of the leaf of the node n i Similarity of each attribute is calculated separately and multiplied by a particular weight, we propose in Table 1 the attributes weight values ordered by their influences on the nodes similarity.
For example: we compute the similarity between the two concepts prospect and candidate. Prospect has 11 synonyms where candidate is the second one.

Process 3: Adaptation Cost Measuring
In this step we measure the cost of the similar product adaptation. The adaptation is the fact of implementing the requirement of intended product for which we did not found a similar requirement in the selected product. The cost can be calculated using the three following techniques: • Expert opinion: the technical experts of the SPL system can estimate the cost of adaptation on the basis of their experience in SPL realization and evolution. • Analogy: the cost of adaptation effort can be measured by comparing the im-plemented operations with the operations to be implemented.
• Cost estimation models: such models estimate the cost on the basis of many factors, Constructive Cost Model COCOMO II [17] the most widely used model. In our approach we propose to combine the three methods, we use the COCOMO II model to estimate the adaptation effort then we validate it with experts by analogy with other development projects.

Tool Support for Adaptable Product Derivation
In order to implement the proposed approach we propose a tool called APD-tool (adaptable product derivation tool). In this section we describe the tool features and its architecture.

APD-Tool Features
The proposed tool support the process illustrated in the approach description, the following functionalities support and implement the three processes: • The import of textual description of the intended product composed of its NL requirements as a set of sentences.
• The processing of the textual description of the intended product in order to generate a graph of semantic Tree models from the NL requirements.
• The processing of existing products NL requirements to generate a graph of semantic tree models for each product.
• The measuring of similarity between the intended product and the existing ones.
• The selection of the most similar products.
• The estimation of adaptation effort costs for each similar products.

ADP-Tool Implementation
In order to implement the features described earlier, we designed the ADP-tool as depicted in Figure 4. The tool includes an interface developed with EclipseIDE which is a java Integrated Development Environment (IDE). The interface facilitates the selection of NL requirements files for users, and also show the resulted similar products with the requirement that must be implemented to adapt them to the customer need.
The syntax processing of the textual NL requirements is performed with the openNLP tool [18], which is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and co-reference resolution. It also includes an evaluation tool that measures the precision of entity recognition and provides information about the accuracy of the used model.
The syntax trees graph generated using the OpenNLP tool are processed then by an algorithm in order to create the semantic graph from the intended product and the existing derived products. We suppose that each NL requirement is expressed in an independent sentence. This algorithm creates a tree for each NL requirement sentence using their syntax graph and the taxonomy of the SPL domain.
The similar products are selected using the algorithm of similar product derivation, which performs the computation of the trees similarity as presented in Section 3.2.2. Each NL requirement Tree is compared with all the requirements trees of a candidate product, then the most similar requirement is selected. The similarity between the two graphs is the average of their trees similarities.
When the similar products are selected, their adaptation to include the not implemented requirements is then studied. The adaptation cost effort is calculated using COCOMO II model. We can use an online tool which allows for applying the model and calculate the cost.

Related Works
In the literature many works were proposed to deal with requirements similarity, however there is no approaches dealing with requirement similarity analysis in the SPL field.
The authors in [19] propose an approach to identify a hierarchy of a set of software products. The hierarchy provides a classification scheme associated with a set of attributes, a non-classified item is included in a specific location based probabilistic matching among the attributes of products in the hierarchies, and the known attributes of the item. Such hierarchy helps identifying the similarity level between products of a specific domain.
A new configuration approach for software product lines based on the user preferences [20]. Instead of selecting features from the domain features to satisfy the user need, a new constraint is added in the configuration process which is user preferences. To allow more fine-grained user customization at runtime, user profiles are introduced which represent the desires of users. In particular, this means the desire to select or deselect features under certain circumstances.
A. Benlarabi et al. Another approach was presented in [21], which speedy reaction to new context conditions and better support for evolution of Wireless Sensor and Actuator Networks, WSANs. They focus on those WSAN features that are susceptible to being managed by a dynamic software product lines (DSPL) approach. They propose a tool to reconfigure the products DSPL of WSAN at runtime. The reconfiguration involves changes at execution time in specific parts of code to give a flexible solution.
A concept similar to our work is the context variability [22], it consists in combining domain features with contextual information for product line derivation. According to contextual information the products to derive is determined, and additional contextual changes are incorporated at execution time.
Finally, works that deal with reconfiguration of products in a SPL at derivation time focus on contextual variability especially the environmental changes and the resources constraints. In our work we focus on functional requirements evolution because of their priority and the frequency of their changes.

Conclusion
In this paper, we introduced our framework for adaptable products derivation, which studies the similarity between an intended product and the existing products in software product lines based on their requirements expressed in natural language. Our goal is to optimize the response time to customers business needs, instead of adopting classical approach which consists in waiting the implementation of the new requirements in the common platform and then building the relevant product, our solution allows for a new approach by selecting the most similar product from the existing ones and computing the cost of its adaptation. The stakeholders can select one approach, depending on the adaptation effort cost. So far, we have proposed a tool that automates the three processes of our framework that are: 1) transforming the natural language requirements into a semantic trees, 2) selecting the most similar product by calculating the similarity function between the intended product and the existing ones, 3) measuring the cost of adapting the selected product. In our future work, we intend to implement the automated tool and validate the results through two cases studies, one in the mobile field and the other in the customer relationship management field.