Visualized Analysis Model for Hadoop Business Data

With the social development, we are stepping into an information technology world. In such a world, our life is getting more and more diversified and rich because of e-business. E-business not only provides us convenience but also large amounts of business data. However, how shall we better store, manage and use these business data has become a major field being studied by e-business. With the rapid growth of data volume, the relational database system cannot meet the requirements of the current status. In this paper, focusing on the visualized analysis model of Hadoop business data, it analyzed the business data in terms of the visualized platform, database and analysis model etc. Depending on the analysis, offline-data analysis and data visualization for Hive database will be greatly improved, so that references and suggestions can be provided for the visualized analysis model of Hadoop business data.


Introduction
With the great development of the society, people are living in a world full of information.Today, the world where we live has more and more information carriers, for example mobile intelligent device and TV commerce website are the commonly-used information carrier.Because of these information carriers, large amounts of business data have been generated and delivered.For the business data, there are useful one which can help people correctly analyze the trend and make a right decision if people are able to timely realize the information value and rationally use the information.For data visualization, it is that valuable information is extracted from a large group of information, which will be presented by charts and figures.So we can say that data visualization is a kind of Z. X. Wang DOI: 10.4236/jcc.2018.6700215 Journal of Computer and Communications form whose presentation is visualized.Normally, under the ground of business intelligence, decision-maker must make a conclusion and analysis on the previously-obtained data experience, trying to make innovations and perfections based on the original data.In this way, one is able to get a favorable advantage in the competition.But the way to extract valuable information from a large amount of information is quite complicated and complex, which will not only waste the human resources but also adversely affect the extraction efficiency.
Therefore, data visualization can get the business data deliver to people via charts or figures, so that people can get valid information in a more convenient way.Anyway, data visualization can greatly help the analysis of business data [1].This paper analyzes the existing Hadoop platform technology and the technologies concerned, studies on visualized analysis model for Hadoop business data, proposes experiment of visualized analysis model for Hadoop business data.

Analysis on Existing Hadoop Platform Technology
With the advancement of the society, the servers used to establish the traditional e-business system were quite expensive and relational database system is used as the business database.Being affected by cloud computing and internet, business data is experiencing an exponential growth and the traditional database system is unable to well handle such a situation and fails to satisfy the basic requirements on data analysis and data processing.

Study on Hadoop and Other Technologies
Hadoop is a basic framework for the distributed system that is developed by Apache Software Foundation, by which the users are able to develop distributed programs though the users don't well know about the distribution details of business data visualization.Once a distributed program is developed, high-speed data computation and data storage can be fulfilled by using the strength of clus-

Z. X. Wang Journal of Computer and Communications
ter [3].Hadoop has both narrow meaning and broad meaning.Normally, the narrow meaning is that Hadoop is equivalent to HadoopCore, which consists of HDFS and MapReduce engine; the broad meaning is Hadoop ecological system, which consists of Hadoop and some source-opening tools like HBase, Sqoop and Zookeeper etc.In the following, a detailed study will focus on HDFS framework, MapReduce framework, Hive data warehouse and HBase database.For Hive data warehouse, it is a distributed warehouse tool existed in Hadoop, where all the data is stored in HDFS.By Hive data warehouse and HDFS, structural data file can be changed into a database list which can better present the data information.What's more, data inquiry can be done by pushing forward the mode of SQL language inquiry.It seems that HDFS high-level design is rather simple and easy because there are only two parts-NameNode and DataNode, whose communication is done via TCP/IP.Normally, one HDFS cluster has one Name-Node and several DataNodes, where there is also a special machine to run the NameNode cases.From Figure 1, we can see that the NameNode does not have a direct communication with DataNode.Therefore, when designing the cluster, NameNode will not proactively make RPC.Instead, NameNode uses the RPC requirement from the user terminal [4], which is shown in Figure 1.

Study on Data Visualization
Today, several data visualization forms are being widely used by business activities, including matrix graph, teaching coordinate diagram and cloud chart, etc.
Every data visualization has its own advantages and disadvantages, whose application value will not come into play unless a rational choice is made.However, the selection mainly depends on that whether the data visualization is able to help us better observe the data.Normally, visualized graphs consist of several basic parts, including primary area, graphics primitive and legend etc.Primary area, the most important part, is the major board and model used to make Figure 1.The basic framework of Hadoop.

Z. X. Wang Journal of Computer and Communications
visualized graphs, which is in the shape of square or rectangle.Actually, if the visualized graphs are organized differently, the contents delivered to the user will be different accordingly.For example, taking bar diagrams and maps as the research subject, rectangular coordinate is always the benchmark existed in bar diagram, whose vertical axis and horizontal axis represents the metrics and types respectively.Contrarily, geographic map is the benchmark of map, whose metrics are presented in different colors.From the aforementioned, we can know that once data visualization is to be done, it is mandatory to get the data organized and converted differently.For example, if bar chart is used to show the sales volume and amount of a commodity, key elements can be expressed abstractly, which will be used to form a visualized analysis model.Then depending on data visualization technology and Hadoop, a visualized analysis mode for Hadoop business data can be established more conveniently [5].

Study on Visualized Analysis Model for Hadoop Business Data
First of all, it is necessary to know that the study is based on Hadoop cloud computation platform.After the foundation is defined, visualized analysis model for Hadoop business data can be designed and established.For business, all business data is stored in the relational database so it is necessary to deliver the business data to HDFS.After the delivery, Hive data warehouse can be constructed.As long as data is transferred to Hive, the analysis results can be put into Hbase database.If so, the final results will blend with the already-existed visualized model, by which a visualized analysis model for Hadoop business data will be formed.

Data Integration of Visualized Analysis Model for Hadoop Business Data
For data integration of visualized analysis model for Hadoop business data, it is a process that valuable business data is extracted from the enterprise database and stored to Hadoop HDFS.Normally, data storage includes two stages-full-amount import of original data and increment import.In this experiment, Sqoop was used to do the data import.See Figure 2 for Sqoop structure.Sqoop belongs to Hadoop ecological system, whose role is to do the data delivery between Hadoop and relational database.Furthermore, Sqoop is able to deliver the valuable data existed in relational database to HFDS [5].

Study on Transformation and Storage of Hadoop Business Data
After the business data is fully imported via Sqoop, the business data shall have its format converted in order to better satisfy the quality requirements on business data.In the visualized analysis model for Hadoop business data, there are two ways used to convert the business data-field combination and field split.First, field combination was studied, finding that all business data is totally Journal of Computer and Communications independent from each other, without any connection [6].If the business data is composed to integrate with each other, conflicts will be caused.Therefore, it is mandatory to establish a temporary Hive list to get all business data imported to the list temporarily.This is the field combination.After that, field combination will be rationally done via HiveQL, which will make sure that the business data is totally professional and exclusive.Field split: time attribute is a kind of time information which is relatively complete, thus, if detailed inquiry and high efficient inquiry are required, it is mandatory to split the time field, making them become an independent information only including year, month and day.Operation process: first, design a temporary Hive list and partition table, import business data into the temporary Hive list; split the independent time field via HiveQL and import the split field into the partition table [7].

Study on Visualized Data Analysis and Visualized Model Establishment for Hadoop Business Data
After business data is imported to Hive database, administrator will design the themes used for visualized analysis according to the enterprise demands.For the theme visualized analysis of Hadoop business data, it is that valid data will be used to know about the basic type and storage structure of business data, by which a theme type with visualized meaning will be formed at last.It is quite important to well design the key because high requirements are set to the response speed of data visualization.As for the design requirements on key, it is required that Category 1 key shall follow the standard "analysis theme and time of formation".Analysis theme is used to distinguish the statistical analysis results while time of formation is used to identify when the statistical analysis result is formed.
According to Table 1, we can know about the key design.For example, how many columns of the key are not always ascertained, and which can be rationally set based on the actual business data.Analysis models about attribute set are mainly stored in the column.The member property is that an inquiry record can be accurately identified in the inquiry result.
Compared with the Category 1 key, the Category 2 key is more complicated and more detailed.For Category 2 key, the design is in the mode "analysis theme + member property + time of formation".Actually, Category 1 and Category 2 have the same column family while their columns are different.And the attribute value of inquiry result is stored in each column.

Analysis on Hadoop Cluster Configuration
The feasibility of visualized analysis model for Hadoop business data can be verified by experiments and the experiment can be further verified by establishing Hadoop cloud computation platform via computer network center and knowing about the Hadoop cluster, hardware configuration, system version, software version and relevant parameters [7].

Verify Experiment of Visualized Analysis Model for Hadoop Business Data
Purchase-sell-stock management platform existed in small and micro enterprises were used to verify the visualized analysis model for Hadoop business data.See Figure 3 for the structure of Hadoop cloud computation system.After the experiment is done, it is mandatory to analyze the experiment.The reason is that when storing data analysis results and establishing data visualized model, HBase database was used by Hadoop cloud computation platform [9].
Besides, high requirements are set to the response speed of Hadoop business data visualization, it is also mandatory to repeat the experiment to test the inquiry performance of HBase and compare the experiment results.See Table 2 (analysis on the time spent inquiring the data for five times and the time spent on common inquiry).
By analyzing the experiment and the table, conclusions can be obtained.For the 1 st inquiry, connection between the client terminal and cluster shall be set up, which makes the inquiry time long.Actually, the time required by inquiry is greatly affected by the network and cluster status etc.After the connection is set up, the inquiry time becomes more stable, millisecond is needed only.So according to the experiment, we can know that Hbase can satisfy the requirements.
According to the analysis on Hadoop technologies, data visualization technology and experiments, we can know that the visualized analysis model for Hadoop business data is feasible and it will play a positive role in the actual application [10].

Conclusion
According to the analysis mentioned above, a detailed study on the visualized analysis model for Hadoop business data was done.Of course, we can also know that the visualized analysis model for Hadoop business data is used to analyze the special features of business data and study the data visualization technology.
Z. X. Wang

Figure 2 .
Figure 2. The model for Hadoop business data.

Figure 3 .
Figure 3.The structure of Hadoop cloud computation system.

Table 1 .
Key and column family.