Hierarchical Clustering Approach for Region Analysis of Contraceptive Users

Through increasing the use of contraceptives to limit births, the Family Planning (KB) Program is one of the government's efforts to control the rate of population growth. Klaten Districts is one of the regencies in Central Java Province with a relatively high number of births and relatively low coverage of active family planning. This study aimed to determine the grouping of sub-districts and these characteristics in the Klaten Districts in 2020. The method used in this study was a hierarchical cluster analysis method, with the best method being the centroid method. In this study obtained 3 clusters with cluster 1 consisting of 23 sub-districts, cluster 2 consists of 2 sub-districts and cluster 3 with 1 sub-district. The cluster characteristics based on the highest number of users of contraceptive methods are cluster 1 contraceptives injection, cluster 2contraception implant, and IUDs in cluster 3.


Introduction
Population explosion is a critical problem experienced by developing countries, especially in Indonesia. The large number and population growth rate cannot be controlled, resulting in deteriorating health levels, low education levels, and increasing unemployment. One of the government's efforts to suppress population growth and create small, happy and prosperous families is through the Family Planning (KB) Program [1]. One indicator of the achievement of the family planning program is the increasing number of family planning acceptors [2]. Every couple who uses contraception is based on an explicit request for family planning, delaying pregnancy, arranging pregnancy intervals, or not wanting to have more children. Contraceptive devices or methods are one of the efforts to regulate pregnancy. Contraceptive methods are long and short-term. Long-term contraceptive methods (MKJP) include Intra-Uterine Device (IUD), contraceptive implant, male sterilization, and female sterilization. In comparison, those included in Non-MKJP are contraceptive injections, pills, and condoms [3].
Based on data from the Central Java Provincial Health Office in 2020, the number of births in the Klaten Districts is relatively high, with the number of births being 16,061 people. The coverage of active family planning in the Klaten Districts is in a low category.
This study aims to examine the use of contraceptive methods in sub-districts in the Klaten Districts. If the people of Klaten Districts have not maximized the use of contraceptive methods, then family planning cadres in each sub-district can promote this government program again. Therefore, it is hoped that this research can be a recommendation for related parties to design policies to reduce the number of births that are under the characteristics of the use of contraceptive methods in each sub-district.
The method that will be used in this research is the hierarchical clustering method [4]. The use of this method is because the existing objects are less than 100, and it will be more effective in using this method because the object of this study only consists of 26 districts. We use the agglomerative hierarchical cluster method. There are sub-methods such as single linkage, complete linkage, average linkage, centroid method, and ward's method. Of the five methods, the best method for clustering will be determined.

Materials
The data used in this study is secondary data obtained from the Central Statistics Agency of Klaten Districts. The data used are related to the number of contraceptives uses in the Klaten Districts in 2020. Table 1 shows the variables used in this study. A small T-shaped plastic and copper device that is put into the uterus.

People
Pill A pill containing synthetic versions of female hormones estrogen and progesterone is produced naturally in the ovaries.

Condoms
An ultra-thin latex worn on the male genitalia.

People Male Sterilization
A clinical procedure stops a man's reproductive capacity by severing the vas deferens to prevent pregnancy permanently.

People
Female Sterilization A clinical procedure stops a woman's fertility by cutting, tying, and placing a ring on the fallopian tube.

Contraceptive Injection
The contraceptive injection releases the hormone progestogen into a woman's bloodstream to prevent pregnancy.

Contraceptive Implant
The contraceptive device is a small rod inserted under the skin of a woman's upper arm.

People
Furthermore, the variables in Table 1 are used to group the sub-districts in Klaten Districts.

Clustering
Clustering is a process for grouping data into several clusters [6]. Cluster analysis is a technique that aims to identify a group of objects that have specific similar characteristics that can be separated from other object clusters. Therefore, objects in the same cluster are relatively more homogeneous than objects in different clusters.

Clustering Analysis Assumptions
In conducting cluster analysis, there are two critical issues that the researcher must focus on, namely [7,8]:

Representativeness of The Sample
The sample taken must genuinely represent the existing population. There is no provision regarding the number of representative samples. However, a large enough sample is still needed to carry out the clustering process correctly. To find out whether the sample taken can genuinely represent the existing population, the Kaiser-Meyer-Olkin (KMO) value is needed. KMO is a comparison index of correlation coefficient value to partial correlation. where: : number of variables : correlation coefficient between variables and : partial correlation coefficient between variables and A KMO value of less than 0.5 indicates that the sample taken cannot repr esent the existing population.

Impact of Multicollinearity
There should be no multicollinearity in the assumption test, namely the linear relationship between the independent variables. The value can see multicollinearity itself of Variance Inflation Factor (VIF). where: 2 : coefficient of determination If the VIF value exceeds 10, it can be concluded that there is multicollinearity [9].

Hierarchical Clustering
The formation of hierarchical clusters has properties such as developing a hierarchy or a branching tree-like structure. Start grouping with two or more objects with the closest similarity, which will later be passed on to other objects so that the cluster will form like a tree. The tree has a precise level between objects, from the least similar to the most similar. The hierarchical clustering method is divided into two algorithms, namely divisive and agglomerative.
Agglomerative hierarchical clustering is a bottom-up hierarchical clustering method that combines n clusters into a single cluster. This method places each data object as a separate cluster. The following are several methods of agglomerative hierarchical clustering [6]: The single-linkage method (also called the nearest-neighbor method) defines the similarity between clusters as the shortest distance from any object in one cluster to any object in the other . The steps of the single linkage method are as follows: a) Determine the minimum distance = ( ). b) Calculate the distance between the cluster that has been formed in step 1 with other objects. c) From the above algorithm, the distances between ( ) and the other clusters are calculated by the formula: The quantities in and are respectively the shortest distance between cluster and and cluster and . The results of the clustering can be displayed graphically in the form of a dendrogram or tree diagram. The tree branches represent the number of clusters.

Complete Linkage Method
The complete-linkage method (also known as farthest-neighbor or diameter method) is comparable to the single-linkage algorithm, except that cluster similarity is based on maximum distance between observations in each cluster. The steps in the complete linkage method are the same as the single linkage method. The difference is in calculating the distance between clusters.
According to [7] and are the distances between objects that are farthest from clusters and and clusters and .

Average Linkage Method
The average linkage procedure differs from the single-linkage or complete-linkage procedures in that the similarity of any two clusters is the average similarity of all individuals in one cluster with all individuals in another. The formula for the average method is as follows: : distance between cluster and cluster : distance between cluster and cluster : number of individuals in cluster : number of individuals in cluster 4. Centroid Method In the centroid method the similarity between two clusters is the distance between the cluster centroids. Cluster centroids are the mean values of the observations on the variables in the cluster variate. Every time a new cluster occurs, the centroid will be recalculated immediately until a fixed cluster is formed. The advantage of this method is that outliers do not have a significant effect compared to other methods. The formed centroid cluster is obtained using the following formula: where 1 = 2 is the number of objects.

Ward's Method
The ward's method differs from previous methods in that the similarity between two clusters is not a single measure of similarity, but rather the sum of squares within the cluster summed over all variables. This similarity calculation uses Squared Euclidean between each object as follows: : number of objects in cluster : number of objects in cluster : number of objects in cluster : distance between cluster and cluster : distance between cluster and cluster : distance between cluster and cluster

Cophenetic Correlation Coefficient
The cophenetic correlation coefficient is one way that can be used to determine the best grouping method. The cophenetic correlation coefficient is the correlation coefficient between the original elements of the dissimilarity matrix (Euclidean distance matrix) and the elements generated by the dendrogram (cophenetic matrix). The formula for the cophenetic correlation coefficient is as follows [10]: where: ℎ : cophenetic correlation coefficient : Euclidean distance between object and ̅ : average of : cophenetic distance between object and ̅ : average of

Results and Discussions Descriptive Statistics
Descriptive statistics are used to summarize the data in an organized manner by describing the relationship between variables in the data, a sample, or a population. Calculating descriptive statistics is an essential first step when conducting research and should always be done before making inferential statistical comparisons [11]. Descriptive statistics for each variable are presented in Table 2. Based on Table 2., in 2020, the population of Klaten Districs used injection as a contraceptive method the most, namely 28,176 people, followed by implants as many as 24,906 people, and the least used was male sterilization, which was only 277 people.     Utara is a sub-district where most of the population uses IUDs, pills, and condoms. In contrast, the fewest users for each method are Kebonarum, Jatinom, and Karangnongko. Furthermore, the number of male sterilization users is quite different in each sub-district, but most found in Kemalang and least in Klaten Tengah. Unlike male sterilization, the number of female sterilization users is relatively evenly distributed in each sub-district, recorded highest in Wonosari and lowest in Kebonarum. The contraceptive injection and implant users, the highest population, were found in Tulung and Jatinom, respectively. At the same time, the fewest are found each in Kemalang and Klaten Tengah.

Clustering Assumptions
This study uses population data. Therefore, it is clear that the population data must be representative and there is no need for a KMO test [12]. Before we conducted hierarchical cluster analysis, multicollinearity testing was carried out. According to [13], an indication of multicollinearity is if the VIF value between independent variables is more than 10. Multicollinearity testing is used to determine the size of the distance that can be used. There are two measures of distance, namely Euclidean and Mahala Nobis. The Euclidean distance measure is used if the independent variables do not indicate multicollinearity in them. Meanwhile, the Mahala Nobis distance measure is used if the independent variables indicate multicollinearity in it. Based on the calculation of the VIF between the independent variables in Table 3, there is no VIF value of more than 10. Thus, it can be concluded that there is no indication of multicollinearity in the independent variables in the data used.

The Results of Clustering Analysis
We use the value of the cophenetic correlation coefficient to determine the best cluster method. The value of the cophenetic correlation coefficient that is close to 1 indicates that the better the results of the clustering process using this method [14]. In Table 4, it can be seen that the value of the cophenetic correlation coefficient in the centroid method is the highest. Therefore, the centroid method is the best cluster method that can be used in this data.
The following is a dendrogram for data on contraceptive users in Klaten Districts by sub-district in 2020 [15].  Figure 8, it is known that cluster 1 is marked with a green line, a blue line represent cluster 2, and the red one for cluster 3. Then, the researcher has determined the number of classifications of 3 clusters. There are no provisions regarding the number of classifications. Researchers used 3 clusters to determine the sub-districts with the classification level of the population that used each contraceptive method. The classification of cluster results for data on contraceptive users in Klaten Districts by subdistrict in 2020 is presented in the Table 5. The clustering process that has been carried out has resulted in three clusters that have each member. In the concept of clustering, each entity is grouped into clusters based on the similarity of its attributes so that each cluster has characteristics or profiles that distinguish one cluster from another. In Table 6 below, a summary of the characteristics of each identified cluster is presented.  Table 6 shows that cluster 1 has the highest average contraceptive injection users compared to other clusters. However, users of IUDs, pills, condoms, male sterilization, female sterilization tend to be moderate, and the use of contraceptive implants is the lowest. In contrast to cluster 1, in cluster 2, it was found that contraceptive implants, male sterilization, and female sterilization were the highest, while other contraceptive methods were the lowest. Meanwhile, cluster 3 majority used IUDs, pills, contraceptive implants, contraceptive injections, and condoms. The lowest use is in male sterilization. The distribution map of the clustering results in Klaten District is presented in the following figure.

Conclusion
The best method for the data in this study is the centroid method from several methods in the hierarchical cluster. This study obtained 3 clusters with each characteristic. In cluster 1, the most widely used contraceptive method was a contraceptive injection, and the lowest was a contraceptive implant. In contrast, in cluster 2, the contraceptive implant method is the highest. While in cluster 3, several contraception methods are most widely used, namely IUDs, pills, contraceptive implants, contraceptive injections, and condoms.