Application of Neural Network-Based Oxide Deposition Models to CMP Modeling

Numerous deposition processes are used in modern semiconductor manufacturing, including high-density plasma chemical vapor deposition (HDP-CVD), spin-on dielectric (SOD), ﬂowable CVD (FCVD), and enhanced high aspect ratio processes (eHARP). Generation of high quality post-deposition surface proﬁles is crucial for chemical-mechanical planarization (CMP) model building, due to the complex nature of the CMP process and long-range effects in CMP. Measurements show complicated post-deposition surface proﬁle height dependence on the underlying pattern for these deposition processes. While high-quality compact models exist for HDP-CVD and SOD processes, building compact models for FCVD and eHARP is a challenge, since they include several deposition and annealing steps, and show complicated behavior width respect to the underlying pattern. In this paper, we present the application of neural networks (NNs) to post-deposition surface proﬁles modeling for the above-mentioned deposition processes. Experiments showed that NNs should have at least two hidden layers and 6–10 neurons per hidden layer to capture the complexity of deposited surface proﬁles. The application of NNs to modeling surface proﬁles of deposition processes has shown that NNs provide a general approach for modeling surface proﬁles of deposition processes without long-range effects for CMP modeling, irrespective of the complexity of the deposition process. of the present research shows that the application of NNs for modeling proﬁles of can any the application of a are open

In multi-level integrated circuit (IC) manufacturing, achieving a desired level of wafer surface planarity is critical before processing the next layer, to avoid topographical margin issues. To achieve both local and global planarity of the wafer surface, chemical-mechanical polishing or planarization (CMP) technology is used. The CMP process removes any excess of conductive and dielectric materials from a silicon wafer using both chemical reactions and mechanical forces to smooth and flatten the wafer surface. This process uses a chemical slurry combined with a polishing pad and retaining ring to remove material and even out irregular topography for planar wafer surface. 1,2 It is well known that CMP is very sensitive to the pattern geometry of a wafer. Uneven distribution of device structures may cause dishing of metal lines and erosion of dielectrics of a layer during planarization, which can result in critical or killer defects in chips. Recent developments in CMP have led to a dramatic improvement in the planarity of dielectrics, as well as dual damascene copper metallization, driving the need for better models that more accurately simulate these higher-quality pre-CMP dielectric profiles.
The two main phases of IC fabrication are front-end-of-line (FEOL) and back-end-of-line (BEOL) creation. The FEOL stage is when the substrate architecture is built, such as electrical isolation structures, transistors, capacitors, resistors, etc. The BEOL process connects all of these integrated devices, and forms the essential logic and memory circuits. A variety of deposition technologies are used to create layers of different materials, such as oxides, glasses, conductors, and nitrides, for both the FEOL and the BEOL. The goal is to achieve a constant density across the deposited layers to bring the whole surface within the depth of focus (DOF) of a photolithography system, and to avoid any erosion and dishing hotspots that can create shorts and opens in the metal interconnects and over-and under-polishing of transistor gates. Construction of transistors in recent technology nodes requires additional FEOL layers not present in the past. Each deposition layer has its own irregularities. As layers are placed on top of each other, these irregularities add up, resulting in complex variations for the higher layers and affecting transistor gates. 3,4 In BEOL, multiple metallization layers are placed on top of each other, each requiring z E-mail: ruben_ghulghazaryan@mentor.com good planarity to achieve high manufacturing yield. The CMP topography variation increases for higher metal layers, due to multi-layer topography accumulations that can result in larger inter-and intralevel fluctuations. Large profile variation in the underlying layers may lead to shorts or opens in the higher metal layers that are difficult to predict before manufacturing.
Modeling of post-CMP profiles for multiple layers to identify potential CMP hotspots (areas susceptible to dishing or erosion) in FEOL and BEOL layers has become an important tool used by many IC manufacturing companies as part of their design for manufacturing (DFM) flow to predict planarity hotspots prior to manufacturing. 5 The high cost of lithography, driven by stringent DOF requirements, the use of double and triple litho-etch patterning, and the introduction of high-k metal gate (HKMG) technology 3 with extra CMP steps, has contributed to an increased interest in CMP modeling. Various waferscale and feature-scale CMP models have been developed to predict the evolution of the wafer surface during the polishing process. 1,2 This increased interest in and use of CMP modeling has dramatically amplified the need for accuracy in CMP models.
The generation of a high-quality pre-CMP surface profile is a crucial part of building a CMP model. Even with advanced deposition processes, the pre-CMP (post-deposition) profile on a patterned wafer is non-uniform, and may contain large variations that can affect surface planarity after CMP. Analysis of three-dimensional (3D) atomic force microscopy (AFM) and transmission electron microscopy (TEM) data shows complicated deposition profile height dependence on the underlying pattern geometry for high-density plasma chemical vapor deposition (HDP-CVD), spin-on dielectric (SOD), flowable CVD (FCVD), and enhanced high aspect-ratio process (eHARP) deposition technologies. While shallow trench isolation (STI) and CMP modeling of FEOL layers have shown successful application of HPD-CVD and SOD deposition models to CMP modeling, building physics-based or compact models for FCVD and eHARP processes is challenging, since these processes include several deposition and annealing steps to fill up trenches.
Recently, machine learning (ML) has found a wide range of applications in different areas of modern life and industry. In IC manufacturing, ML and feed-forward neural networks (NNs) have introduced new and exciting opportunities for developing high-accuracy models for complex and advanced deposition processes. Basically, NNs consist of layers of artificial "neuron" data processing elements with an input layer, several hidden processing layers, and an output layer, with weighted connections between the layers. 6,7 Training an NN means finding values for the weights that connect different layers of the NN, based on minimization of the root mean square (RMS) or other cost function, that best fit the training and validation data. Once an NN is trained, it can make high-quality predictions for new data 6,7 (data that was not used in training).
Sensitivity analysis of measurement data combined with ML algorithms shows that the post-deposition surface profile of the abovementioned deposition processes depends primarily on the underlying local pattern geometry, while long-range effects are secondary. This allows us to apply NNs to the modeling of the pre-CMP surface profile, using as input the geometric characteristics of the underlying pattern. 8,9 In this paper, we apply an NN-based full-chip deposition model for predicting post-deposition profile heights and geometry characteristics for use in CMP modeling. We determined that, given the geometric characteristics of pre-deposition profiles, NN configurations with 2 hidden layers and 6-10 neurons per hidden layer are sufficient to accurately predict the surface characteristics of a profile's erosion, dishing, and width data. First, we applied NN models to simulate pre-CMP surface profiles of HDP-CVD and SOD processes, for which high-quality compact models are available for comparison. Then, we used NN models for FCVD and eHARP pre-CMP surface profiles modeling. For these processes, NN training is performed using training data derived from test chips, until a small RMS and high correlation (up to 99%) to training data is achieved. A fit with about 97-98% correlation is also obtained for the validation data. This level of accuracy from the NN-based deposition models is sufficient to produce trends for the post-deposition surface profiles that designers can study to make modifications to their layout that will increase the yield.
The results presented in this paper show that the use of NNs for modeling surface profiles of deposition processes may become a general approach for modeling surface profiles after deposition in CMP modeling, regardless of the complexity of the deposition technology. We conclude with a discussion of challenges and perspectives on the given approach.

Experimental
CMP modeling.-CMP has been a key component of chip manufacturing for more than two decades, and is used to improve wafer planarity. By measuring different aspects of the layout, CMP modeling attempts to predict the planarity as a chip is built up over multiple layers, and find areas in the design with a higher-than-average probability of developing post-CMP defects. A CMP model is developed by first extracting the underlying geometrical properties of the pattern from the layout, then generating a pre-CMP surface profile after etch and numerous deposition steps, and finally predicting the post-CMP surface profile of the layout. 10 CMP simulation begins with geometric data extraction. First, the design is divided into fixed size tiles, for which average geometrical characteristics like width, space, pattern density, and perimeter are derived. Then, "effective trench" (ET) approximation is used to model each tile's surface profile dynamics with time. In ET approximation, each tile represents a trench with width, space, pattern density, and perimeter geometric characteristics derived from the design, as well as height data representing the heights of material inside (Z T ) and outside (Z NT ) the trench. 10 The average geometric characteristics of a pattern for each tile are determined and passed to etch, deposition, and CMP models ( Fig. 1).
During CMP simulation, etch, deposition, and polishing models simulate the changes in Z T , Z NT , and the geometric data for each tile. In a CMP simulation flow, the pre-CMP surface profile data is generated by either a post-deposition model or data from the previous polishing step. Because a post-deposition profile is always used as input for the first CMP step, it is essential to develop a set of deposition models that correspond to the deposition processes used for given technology, to ensure correct input profiles are generated for CMP simulation and reliable CMP modeling results.
An important step in developing an accurate CMP model is to use measurements from test chips. A test chip usually consists of periodically placed test patterns of parallel trenches of different widths, separated by differing spaces. It is important that the size of the test chip and the number of structures are chosen in such a way that comprehensive coverage of the width, space, perimeter, and pattern density values supported by the technology node is provided, without violating any design rule checks (DRC). Thus, the test chip should have structures that contain test patterns with narrow and wide width and space, as well as low to high density combinations. An AFM scanner or other profiler tool is normally used to collect erosion and dishing data from line scans over test patterns (Fig. 2). The information about the layer stack and material thicknesses can be used to convert erosion and dishing data into Z T and Z NT surface profile height data.

Modern Deposition Processes
Due to its complicated nature and the long-range effects of the CMP process, the highest quality pre-CMP surface is absolutely crucial when developing a high precision CMP model. Even the application of leading-edge deposition processes do not guarantee a uniform post-deposition (pre-CMP) profile on a patterned wafer. The profile may still contain relatively big fluctuations that consequently affect the planarity, even after CMP. A detailed analysis of 3D AFM and TEM data shows that the relationship between heights in a pre-CMP profile and the underlying pattern geometry may have a complicated HDP-CVD.-Initially used for STI, the HDP-CVD process nowadays is widely used for deposition of various oxides in chip manufacturing industry. Simultaneous deposition and ion sputtering processes during HDP-CVD creates triangular and trapezoidal shapes over active areas. The geometry of these shapes changes with the underlying active area pattern, and results in variations in the thickness of deposited oxide.
SOD.-The SOD deposition process coats the wafer with oxide material in a liquid form. A pre-determined amount of liquid oxide is dispensed onto the wafer surface, followed by rapidly spinning the wafer. Centrifugal force enables the uniform distribution of the liquid over the wafer surface and the filling of trenches. After this process, the oxide material is solidified by a low-temperature bake.
FCVD.-Developed by Applied Materials, 8 the FCVD process deposits a high-quality liquid-like dielectric film on the wafer surface, allowing the film to easily flow into gaps, filling them without voids or seams. Compared to plasma CVD, FCVD demonstrates a better deposition profile in bottom-up gap-filling capability. The FCVD process has two main stages: deposition of oxide layer in liquid form and its conversion to solid oxide. Deposition is most important for filling gaps. The deposition reaction keeps its temperature sufficiently low to maximize flowability. The second stage, the conversion to solid oxide, is essential for forming stable oxide.

Input Layer
Output Layer Width Space Pattern Density Perimeter Erosion Dishing Width eHARP.-eHARP is a non-plasma-based CVD oxide film deposition process that addresses gap filling requirements for STI at the 4xnm node and beyond. 11 This process may be used before HDP-CVD to fill up narrow trenches.
FCVD and eHARP processes are too complex to develop a physicsbased or compact model with a reasonable runtime and accuracy for CMP modeling. In this paper, we use ML to develop NN-based models for these deposition processes to use as input for CMP modeling.

NN-Based CMP Models
For testing the viability and precision of application of ML to generate NN-based deposition models, we carried out experiments based on the following four deposition processes: HDP-CVD, SOD, FCVD, and eHARP. Calibre CMP ModelBuilder and CMPAnalyzer tools were used to extract local geometric characteristics of a pattern (width, space, pattern density, and perimeter) from the design layout, and to generate input for the NN-based model. Width, space, pattern density, and perimeter were used as input to the NN, and erosion, dishing, and width were defined as output (Table I).   In the specifics of the considered problem, data for the deposition processes modeling must be extracted from a specially designed test chip, which severely limits the amount of available data for training and validation. To avoid over-fitting, the simplest configuration of NNs should be selected for modeling. The more complicated the NN with several hidden layers and neurons per layer, the more weights of NN should be obtained from training data. The number of weights N w of NN has the form where L is the total number of layers and n i is the number of neurons in layer i. Here n 0 corresponds to the number of input neurons and n L+1 to the number of output neurons. We see that the number of weights N w increases significantly with each extra hidden layer. To avoid overfitting, it is essential that N w be much smaller than the number of training data points.
In our experiments, we started from a single hidden layer NN for training. A single hidden layer NN with a reasonable number of neu-rons (< 60) was not able to fit both training and validation data well (see Results and Discussion), so we moved to more complicated NNs with two and three hidden layers. Experiments on NNs with three hidden layers demonstrated over-fitting, making NNs with two hidden layers optimal for modeling these deposition processes. After this evaluation, we selected a simple NN with four input neurons, two hidden layers with 6-10 neurons per hidden layer, and three output neurons (Fig. 3) for use with our given problems.

Training and Validation Data Generation
Both training and validation data are required for training NNs. Training data is used to adjust the weights used in the NN, while validation data is used to minimize over-fitting. In contrast to usual NN applications, training and validation data for deposition processes modeling must be collected from specially designed test chips, as previously discussed. In this section, we examine obtaining the optimal design for test chips and generation of training and validation data.
To achieve a strong correlation between in-silicon surface data and NN-based model predictions, special care must be taken when collecting training data. The training data includes erosion, dishing, and width data that is collected from CMP test chips using AFM or other profiler scanner tools. The size of the test chip and the number of structures must be chosen so that they provide sufficient coverage of the width, space, perimeter, and pattern density values supported by the technology. Thus, the test chip should have structures that contain test patterns with narrow and wide width and space, as well as low to high density combinations.
To find the optimal number of test patterns in the test chip for generating training data, we carried out a series of experiments based on HDP-CVD and SOD compact models, using the Calibre CMP Model-Builder tool with different test chip structures and NN configurations.
First, a test chip was designed for compact CMP model building. The test chip was manufactured at the foundry using the HDP-CVD and SOD processes, and erosion and dishing data collected. Using this measured erosion and dishing data, compact models for HDP-CVD and SOD processes were built using the Calibre CMP Model-Builder tool, as schematically shown in Fig. 4 and described in Ref. 10. Training NN-based deposition models requires more erosion, dishing, and width data than the corresponding compact Calibre CMP Mod-elBuilder deposition models. Thus, special test chip layouts must be designed to generate enough data for training and validation sets.
To obtain the optimal number of patterns on test chips for generating training and validation data, we carried out a series of experiments using the HDP-CVD and SOD compact models with different test   Fig. 5.
The simulated data showed that, starting from an initial relatively large width and/or space, the surface profile tends to a final value that remains almost the same, demonstrating that the number of structures with wide trenches and spaces may be limited to a few patterns on the test chip (Fig. 6).
This analysis suggests that the test chip structures should contain a variety of test patterns, with most having narrow width and space and only a few having wide width and space, as well as low to high density combinations. Fig. 7a shows the structure of the test chip, while 7b and 7c illustrate the distribution of the corresponding pattern density and width of patterns. Using training and validation data from the optimal test chips, experiments were performed to obtain the best NN configuration for NN-based HDP-CVD and SOD models, as discussed in the next section.
However, for FCVD and eHARP deposition processes, only the data measured from original test chips were available, since the optimal test chips layouts were not available at the time measurements were collected.

NN-based modeling of HDP-CVD and SOD surface profiles.-
We first built NN-based models for the HDP-CVD and SOD processes. To obtain the best NN configuration for NN-based models, we combined measured and simulated data from the Calibre CMP ModelBuilder tool to perform a series of experiments with different numbers of hidden layers. For NN configurations, we considered NNs consisting of one, two, and three hidden layers. Multiple training runs were carried out for each configuration. It was obtained that an NN consisting of one hidden layer shows poor fitting to the validation data, especially for dishing data (see Fig. 8, left). On the other hand, NNs consisting of three hidden layers showed either over-fitting of the validation data, or the same accuracy as an NN with two hidden layers, but with much longer training time (Fig. 8, right). Thus, we determined that, for HDP-CVD and SOD, the optimal configuration in terms of the model complexity and training runtime is an NN with two hidden layers and six neurons per layer (Fig. 3a). As mentioned in the previous section, the training results for this NN demonstrated good fitting of training data, with a small RMS error and about 99% correlation between the results and the training data. Fig. 9 shows the validation data fitting for HDP-CVD and SOD models. The validation of the NN-based models was carried out on the validation sets constructed of both compact model simulation and measured data. We performed numerous experiments with different activation functions, and by examining the results of the validation data fitting, determined that tanh(x) returned the best results. In some cases, using linear activation functions for the output layer results in nonphysical negative values, so we prefer to use tanh(x) as the activation function for the output layer as well. It can be seen that the application of the tanh(x) function to the NN output layer and good quality training data led to correct erosion and dishing predictions (i.e., no non-physical (negative) erosion and dishing predictions) for these processes (Fig. 9).
As a simple application of NN-based models, one can study the erosion and dishing trends of HDP-CVD and SOD processes versus the geometric characteristics of a design pattern (Figs. 10, 11). In particular, Figs. 10a, 10b shows erosion and dishing dependence based on the width and space of the underlying pattern for the HDP-CVD process. It can be seen that, in structures with wide width and space, the erosion and dishing tend to saturation, whereas for narrow width and space structures, a non-trivial trend is found. A similar approach can be employed to study the dependence of erosion and dishing based on the pattern density and perimeter of the underlying pattern, as shown in Figs. 11c, 11d.

NN-Based Modeling of FCVD and eHARP Surface Profiles
In this section, we discuss NN-based models for FCVD and eHARP processes. Analysis of the data shows that the surface profile after the FCVD process has a complicated structure with highly non-trivial dependence on the underlying pattern, as compared to the HDP-CVD and SOD processes. Thus, a more complicated NN is required to model the FCVD process. We selected an NN with two hidden layers and ten neurons per hidden layer for FCVD process model building (Fig. 3b).
In many practical applications, an eHARP layer is deposited prior to an HDP-CVD layer. In this work, training and validation data was generated for the HDP-CVD surface profile after eHARP, rather than using the raw eHARP surface profile. In contrast to FCVD, the eHARP surface profile after HDP-CVD shows simpler trends. Accordingly, for modeling the surface profile of HDP-CVD after eHARP process, we selected a relatively simpler NN with two hidden layers and six neurons per hidden layer (Fig. 3a).
Fitting curves of FCVD and HDP-CVD after eHARP training and validation data are shown in Fig. 12. Again, we achieved a small RMS and good correlation (up to 9l% and 95% respectively). The average error is about 15-20%.
One of the advantages of NN-based modeling is the opportunity to perform an error estimation of the predicted data based on the training data fitting. Error bars on the validation set in Fig. 13 indicate errors in the fit of the NN-based model at given points. Relatively large errors indicate that the given combination of input data is not well sampled with training data. The fit can be improved by adding the points showing poor accuracy to the training data, and retraining the network.
Using NN-based models, surface plots of erosion and dishing versus width of space of the underlying pattern of FCVD and HDP-CVD after eHARP processes are presented in Figs. 14 and 15. For the FCVD process, it can be seen that erosion and dishing of wide width and space structures tends to a saturation, while a highly non-trivial trend is seen for erosion and dishing of narrow structures. Unlike the FCVD process, the dependence of erosion and dishing on width and space of the underlying patterns in the HDP-CVD after eHARP process looks much simpler. This may explain the reason for the popularity of this technology among manufacturers.

Conclusions
New, more accurate deposition models are needed to accurately predict the surface profiles of today's advanced deposition processes in CMP modeling. This paper presents the application of NN-based full-chip deposition models for predicting post-deposition profile heights and geometry characteristics of deposited patterns in CMP modeling. We trained NNs with two hidden layers and 6-10 neurons per hidden layer to accurately predict the surface characteristics of the post-deposition profiles, including erosion, dishing, and width data, for HDP-CVD, SOD, FCVD, and eHARP deposition processes. We obtained a reasonable fitting with an NN using two hidden layers of both training and validation data.
Due to cost and resource limitations, the number of available physical measurements is often very limited, and is insufficient for accurate model building. To address this problem, special test chips were designed with patterns covering possible width, space, and density combinations for collecting measurements. Weak long-range effects characteristic to HDP-CVD, SOD, FCVD, and eHARP processes make it possible to apply the patterns' local geometry characteristics to model surface profiles.
Results of the present research shows that the application of NNs for modeling surface profiles of deposition processes can be generalised to any deposition process with weak long-range effects. However, the application of a given methodology to processes with long-range effects, such as electrochemical deposition and CMP, is challenging. There are several open questions that must be answered. What type and configuration of NN will best take into account longrange effects for surface profile prediction? What should the input to such a network be? In the meantime, the use of NNs may be considered as a general approach for modeling surface profiles of deposition processes with weak long-range effects for CMP modeling, regardless of the complexity of the deposition technology.