Simulations of Massive Star Formation through Parallel Computing (AMR)
Application
In astrophysics, massive stars remain a significant unsolved problem. Massive stars form in dense regions of large turbulent molecular clouds. With masses greater than 10 solar masses, they dominate their environment through prodigious UV output and dense stellar winds. After lifetimes of only several hundred thousand years, they end in violent supernovae that disperse heavy elements throughout the interstellar medium.
Computationally, massive stars present a formidable challenge. A robust treatment of hydrodynamics is necessary to deal with the supersonic shocks of the molecular cloud turbulence. An accurate gravity treatment is necessary to model the collapse and accretion of matter onto forming protostars. At an early stage, the deuterium in the protostar ignites requiring sophisticated treatment of radiation transfer and radiation pressure. Most importantly, the gravitational instabilities that cause collapse of these massive objects result in density changes of 10s of orders of magnitude. In general, the size of the domain, complexity of the physical processes, and the need for high resolution make this problem ideal for parallelization.
Below I will discuss the AMR code developed at LLNL and used by the Berkeley Astrophysical Fluid Dynamics group to model massive star formation [1]. Due to the complexity of the code, I will limit the discussion to the AMR, hydro, and gravity components of the code.
Parallel Computing
Algorithm
Adaptive mesh refinement (AMR) is a technique that has been under development for 15 years beginning with work by Berger, Oliger, and Collela [2]. The algorithm starts with a uniform Cartesian grid placed on the computational domain. In the course of the calculation, additional grids are selectively added to the domain based upon specified refinement criteria (e.g. density gradients between adjacent cells). Each level is passed to an integrator and separately evolved at a smaller time step than its parent grid. At then end of each course time step, a correction procedure makes adjustments at interfaces to ensure conservation. Finer grids that are no longer necessary are de-refined. The picture below shows the areas of refinement in a turbulent run.

Parallelization
Since the domain is naturally broken up into a series of grids, parallelization is easily implemented by distributing grids among the number of processors. Load balancing is accomplished by using an approximate knapsack algorithm [3]. The average load balance inefficiency is given below, where N is the number of grids and K is the number of processors. The more grids per processors, the better the efficiency of the algorithm [4].

Targeted Machines
The AMR code used by the Berkeley Astrophysical fluids group is currently running on the machines:
|
Name |
System |
Rank[5] |
Facility |
|---|---|---|---|
|
Thunder |
Intel Itanium2 Tiger4 1.4Ghz-Quadrics |
5 |
LLNL |
|
MCR |
Linux Cluster Xeon 2.4GHz-Quadrics |
19 |
LLNL |
|
Seaborg |
SP Power3 375 MHz 16 way |
21 |
NERSC/LBNL |
|
DataStar |
sServer pSeries 655/690 (1.5/1.7 Ghz Power4+) |
25 |
UCSD/San Diego Supercomputer Center |
Scalability and Performance
The scalability results shown below are taken from tests on DataStar, where N is the number of processors and the numbers are in CPU micro seconds per cell cycle advance. The last 3 columns are normalized to 16 processors.
|
N |
hydro |
gravity |
total |
hydro |
gravity |
total |
|
16 |
33.8 |
14.8 |
48.6 |
1 |
1 |
1 |
|
32 |
32.7 |
15.0 |
47.7 |
0.97 |
0.94 |
0.98 |
|
64 |
31.8 |
18.3 |
50.1 |
0.94 |
1.24 |
1.03 |
|
128 |
39.0 |
27.3 |
66.3 |
1.15 |
1.84 |
1.36 |
For up to 64 processors, the scaling is excellent for both hydro and gravity. MFLOP/s numbers for these runs are unavailable.
Results Obtained
One recent publication of results run on the DataStar and Seaborg platforms simulated accretion unto sink particles (representative of massive protostars) in a turbulent medium[6]. In a non-turbulent medium, an analytic solution for gravitational infall gives the rate of accretion onto a spherical object. The simulations showed that in an environment of supersonic turbulence, the process is more variable and the mass accretion rate can be much lower.

The plot shows the xy slice through the origin of a run with vorticity of w=1. The red arrows indicate velocity.[6]
Comments
Pros:
For runs with a large density spectrum, the AMR is an efficient griding algorithm that avoids the prohibitive time constraints of griding the entire domain to the max resolution.
For medium sized domains, hydro and gravity scale well up to 64 processors. This allows efficient simulation of turbulent molecular clouds (AMR is much better at capturing shocks than comparable codes like SPH).
Cons:
Due to different exponential laws, this code is becoming more inefficient, such that half the CPU time is spent elsewhere in the code. This bottleneck is probably caused by communication between processors.
Other non-local processes in the code, such as radiation transfer, take 10 times longer to solve for than hydro, thus significantly decreasing the speed of a cell-cycle advance.
References
[1] http://astron.berkeley.edu/~cmckee/bafd/index.html
[2] M. J. Berger and J. Oliger, Adaptive Mesh Refinement for Hyperbolic Partial Differential Equations, J. Comput. Phys., 53, pp. 484-512, 1984.
[3] http://seesar.lbl.gov/AMR/
[4] http://seesar.lbl.gov/CCSE/Publications/car/ParHyper.pdf
[6] M. Krumholz, C. McKee, and R. Klein., Bondi Accretion in the Presence of Vorticity, astro-ph/0411526.