Today, the typing speed is exceptionally slow, only less than a thousand words in two hours, so the update will be delayed, probably around midnight or one or two o'clock. By then, just refresh this chapter to see the update. By the way, it's already the twenty-eighth, but surprisingly, this chapter's comnts haven't been opened yet; it seems we'll have to wait until next month for the unblocking.
...
Abstract: To ensure network security, a thod for network security risk mining and estimation based on big data analysis is proposed. The Map and Reduce functions of the Hadoop platform are selected to mine association rules of network security events. The mined association rules are used as features of network security events, and the features of network security events serve as inputs to the Support Vector Machine with a Radial Basis Kernel Function. A network security risk estimation model is built through training, and the QPSO thod's optimization capability is used to find the Support Vector Machine's optimal paraters. Experintal results show that this thod enhances the precision of network security risk estimation and provides important reference value for defending against network security risks.
Keywords: Big data analysis; Network security risk; Association rules; Support Vector Machine
1 Introduction
The developnt of internet technology is incredibly rapid, and the internet environnt is highly open. So attackers exploit the uncertainty and diversity of the network to attack, seriously threatening the safe operation of the network [1-2]. Previous network defense thods only utilized information contained in data packets to obtain risk estimation results, which had low accuracy. To ensure network operation safety, enable network administrators to clearly understand the network status in real-ti, anticipate network security risks, and adopt appropriate defense asures to resist risks, establishing a secure network operation is crucial [3-5]. Many researchers currently conduct extensive studies on network security risks. Han Xiaolu and He Chunrong, among others, use intuitionistic fuzzy sets and attention chanisms to assess network security status [6-7]. However, network security risks still have the defects of excessive alarms and high false alarm rates due to large data volus. Mining useful network security risk data from massive network big data is key to precise network security risk assessnt. When there are attack behaviors in the network, a multitude of alarm information of various types will form, increasing data mining difficulty [8]. Efficient big data mining thods are extrely important for improving network security risk assessnt accuracy. Therefore, this paper proposes a thod for network security risk mining and estimation based on big data analysis, and tests and analyzes its performance.
2 Network security risk mining and estimation thod based on big data analysis
2.1 Extraction of association rules in data mining
Collect security events from massive network data. Due to significant differences in the format of collected network security events, security events need to be normalized to facilitate the mining of association rules contained within. Use the association rules mined to analyze similar viruses [9], similar vulnerabilities, and other attack behaviors in network security risks, thereby enhancing the accuracy of network security risk assessnts. Use data mining thods of big data analysis technology to extract association rules of network security events. Let W={w1, w2, …, wn} denote the set of security event elents, R={r1, r2, …, rn} denote the dataset, where each elent ri in dataset R is a set established by W, i.e., riW exists. Definition 1: Use elents within Set R to establish Set C, and the formula for calculating the support degree of Set C within the dataset when the elents satisfy the Cri requirent in quantity l is as follows: (1) (1) Definition 2: Given sets C and D satisfying AW∩IDW, use the confidence of C→D. The C→D that satisfies the minimum confidence and minimum support in the mined data set is the association rule to be mined by the big data mining thod. Association rules are obtained by mining frequent item sets within the transaction set, identifying association rules existing between different transactions. Network security events have the characteristic of being extrely large in scale [10]. A cloud computing platform, the Hadoop platform, was selected to achieve the mining of association rules from massive network security events. The process of mining association rules using big data analysis technology is divided into two parts: (1) Mining frequent item sets that et the minimum support; (2) Using the frequent item sets obtained through data mining to mine association rules that et the minimum confidence conditions. The Hadoop platform uses the Map function and the Reduce function to obtain project subsets and comprehensively assess the support of acquired subsets. By analyzing all subsets' support, the frequent item support degree of the mined network security events is obtained, and the frequent item sets contained in the network security event dataset are identified. The process of mining association rules on the Hadoop platform is as follows: input the minimum support β and the original network security event dataset R into the Hadoop platform for computation; output frequent items that et the minimum support as computation output for the Hadoop platform. Map Task: (1) According to the input file path, use the frequent item set's minimum support to divide the original network security dataset into data subsets of size n, format each divided subset, obtaining a key-value pair, where value and key respectively represent data information and character offset; (2) Read the key-value pairs obtained from different subsets using the Map function, parse the data information value with the split function, and transmit the parsed result to the set; (3) Use the output key to represent all subsets, setting the subset value equal to 1; (4) Call all optional Combin functions. All Map ends generate key-value pairs with the sa key in the network security data, rge all identical key-value pairs through the Combin function, improving the defect of low computational efficiency caused by sending acquired key-value pairs to Reduce ends through the network; Reduce Task: (1) Sort the key-value pairs sent by the Combin function, rge key-value pairs with the sa key, obtain key-value pairs read by the Reduce function, and accumulate the L() values within the key-value pairs. The number of supports of the key set in the network security dataset R is the global support for candidate frequent item sets in the Reduce end; (2) Send candidate item sets that exceed the minimum support to an external storage table based on minimum support, use the external table obtained to query and mine frequent item sets, setting these frequent items as inputs and related files of the MapReduce program. The minimum confidence δ and association rules satisfying minimum confidence δ are used as inputs and outputs for mining network security event association rules, respectively. The computation process is as follows: (1) Select the Map function to start the setup thod to connect to the database; (2) Divide the frequent item sets in the external table of stored data, after division obtain data subsets of quantity n, format all data to key-value pairs; (3) Parse elents within frequent item sets in value, after parsing obtain the corresponding value represented as (C, D, SValue), storing the acquired (C, D) in the set; (4) Solve the subset C within the frequent item sets, read subset C's support degree sup(C), use it to express C→D's confidence; (5) When the confidence exceeds the pre-set threshold, the frequent item sets contain all elents outside this subset with association rules with this subset, use the obtained difference set and subset to establish the key value, and that key value's confidence is the value. Through the above process mine network security event association rules and realize network security risk estimation based on mined association rules using the support vector machine thod.
2.2 Network Security Risk Estimation thod
Use the mined association rules as network security event features and estimate network security risks using the association rules extracted. Use the sample input xi and sample output yi composed of (xi, yi) to represent the training sample set of network security events. This sample set satisfies xiRn, yiRn. Map the network security event samples within the sample set (xi, yi) to a high-dinsional feature space using a nonlinear mapping function φ(), yielding the optimal linear regression function expression for network security event assessnt as follows: (2) in the equation, b and w represent bias and weight, respectively. Obtain the solution of the LSSVM regression model using the structural risk minimization principle, with the formulas as follows: (3) (4) where ei and C represent the regression function's error with actual results and punitive function, respectively. Introduce the constraint optimization problem in formula (4) to obtain the formula using the Lagrange multiplier as follows: (5) where ai represents the Lagrange multiplier. Using the rcer condition to define the kernel function, the formula is as follows: (6) Select the radial basis kernel function as the kernel function for network security risk estimation, yielding the expression for the radial basis kernel function as follows: (7) Obtain the final support vector machine regression model as follows: (8) where σ is the width of the radial basis kernel function. The precision of the estimation by the support vector machine is determined by its paraters, selecting appropriate paraters can enhance the accuracy of network security risk estimation. Use the QPSO algorithm to optimize the paraters of the support vector machine. Set the QPSO algorithm with particle quantity m in the D-dinsional search space, representing the particle's original positions using xi(xi1, xi2, …, xid), PB(pb1, pb2, …, pbd) represents the current optimal position, and GB(bg1, bg2, …, bgd) represents the global optimal position. The particle evolution expression is as follows: (8) in the equation, mbest and β represent the best particle value within the particle swarm and the algorithm's convergence speed, respectively. At iteration t, the formula for calculating the algorithm's convergence speed is as follows: (9) The process of network security risk assessnt is as follows: (1) Set the quantity of particles within the swarm according to the scale of network security risk assessnt, the particle dinsions in the swarm representing the paraters C and σ for estimating network security risk support vector machine respectively; (2) Set the paraters of the particle swarm algorithm for optimizing support vector machine paraters and maximum iteration tis; (3) Obtain the particle's fitness function; (4) Calculate the particle's optimal individual position and global optimal position, establish a network security information database; (5) Update each particle's position within the particle swarm; (6) Repeatedly iterate computation according to the above process, determine if the termination condition is t, if so proceed to step (7), if not return to step (3); (7) The optimal particle obtained through the above process is used as the support vector machine's parater, completing the establishnt of the network security risk estimation model, and obtaining network security risk estimation results using the established model.
3 Case Analysis
Select 60 minutes of communication data from a certain communication network's operation ti as the test object, and collect a total of 5,846,544 sample data using the thod proposed in this article to assess network security risk. The intuitionistic fuzzy set thod (reference [6]) and the attention chanism thod (reference [7]) are selected as comparison thods. The thod proposed in this article uses big data analysis technology to mine association rules existing among massive network communication data, and counts the number of association rules mined at different minimum confidence and minimum support. The statistics are shown in Figure 1. As shown in the experintal results of Figure 1, with a minimum confidence and minimum support of 0.7 and 0.3 respectively, a relatively large number of association rules can be mined. When mining massive network data using the thod proposed in this paper, the β and б values are set to 0.7 and 0.3 respectively. The thod proposed in this article possesses a high association rule mining performance, and still maintains high mining efficiency when applied to massive network communication data. After completing the mining of association rules, the QPSO algorithm's optimization performance is used to obtain the optimal paraters for the support vector machine, with the convergence situation of the QPSO algorithm at different iteration tis shown in Figure 2. As shown in Figure 2's experintal results, the thod proposed in this article uses the QPSO algorithm to find the optimal paraters for the support vector machine to estimate network security risk, needing only about 40 iterations to quickly obtain the optimal support vector machine paraters. The QPSO algorithm selected by this thod has high optimization efficiency, and can quickly acquire optimal paraters for the support vector machine in a short ti, enhancing network security risk estimation performance. The optimal paraters of the support vector machine obtained through the QPSO algorithm are C=130 and σ=135. The network security risk assessnt model is established using the optimal paraters obtained from the QPSO algorithm and uses the established security risk assessnt model to evaluate the number of security risk incidents in a 5-hour network operation, comparing the results with the other two thods, as shown in Figure 3. As shown in the experintal results of Figure 3, the thod proposed in this article evaluates network security risk results with a high degree of similarity to the actual network security risk results, with a high degree of agreent in fluctuation trends. The comparison results indicate that the thod proposed in this article can effectively predict network security risks, and the prediction results are highly reliable, serving as effective evidence for network administrators to manage network security. Through multiple tests, the performance of network security risk assessnt is compared among the three thods, as shown in Figure 4. As observed in Figure 4's experintal results, evaluating network security risk using the thod proposed in this article can effectively improve deficiencies such as the significant amount of historical data needed and high sensitivity to missing data, indicating high reliability when applied to network security risk assessnt. Table 1 displays the evaluation of security risk conditions for a test network from 7:00 to 24:00 on January 3, 2020, using the thod proposed in this paper. Based on Table 1's experintal network security event situation table, the thod proposed in this paper is used to evaluate the attack types of risk events, with the results appearing in Table 2. Analysis of Table 2 demonstrates that the thod proposed in this article can evaluate security risk events, effectively identify the specific attack behaviors of network security risk events, and validate the high validity of the thod proposed in this article in assessing security risk events.
4 Conclusion
Network security risk estimation is a crucial part of the current network security defense system. With the increase in data volu within networks, higher requirents have been put forward for network security risk estimation. By fully considering the attack situation during network operation and applying big data analysis technology to network security risk estimation, the advantages of processing massive data are utilized to fully explore the association rules existing within network security events and estimate network security risks. The experintal verification shows the studied thod can achieve effective network security risk estimation, ensuring effective protection of network security in a massive data operation environnt.
Reviews
All reviews (0)