Oriented DLA for Growing Crystals

Introduction

This is based on some of my prior work with the diffusion-limited aggregation algorithm, and some tips from Mattias Malmer. I think it's important background information to understand that "diffusion" in the name of this algorithm refers to a simulated random diffusion process, where particles are randomly jittered around with brownian motion, until eventually they find themselves "close enough" to a growing set of anchored particles to join. The "diffusion limit" part of this means that the growth continues in a way that is limited by this simulated diffusion process.

This is the first application project for the big server CPUs in Susan, and was an excellent opportunity to get hands on with multithreading and CPU sync primitives and how to guarantee correct operation in situations that would otherwise create unstable race conditions. This work has many threads to follow up on, and I'm quite interested to explore "polyatomic" multiply-modular units interacting. The individual crystal elements click together like Lego during the sim, due to the use of oriented bonding offsets. Introducing multiple particle types, particle colors, and bond polarity to the model would massively expand your capabilities to create organic forms. Here, too, the diffusion limit does even more heavy lifting - with an eye to the possibilities here, the sim could model the relative mixing of more than one particle type being fed into the sim, fluid dynamics, erosion of existing structure, you name it.

Your basic implementation of DLA, in the style of the toy computer science example problem, can do this with a uniform threshold for defining "close enough" and then freeze the particle at whatever location it is at, at that time. Rather than doing any further evaluation, it takes "first valid location". That's what my GPU implementation was doing. You can see that it grows in a blobby, coral-like, almost tree-like branching structure at a micro-scale, which at a larger scale manifests as this diffusely textured surface you can see on copper cathode plates out of an electrical refining process. It can be interesting to watch it form, but it lacks much high level structure, due to this uniform, unoriented local bonding behavior.

But there's a whole world of much more specific, structured bonds which can occur; based on ionic relationships, based on interacting polarities of atomic orbitals, and the effective polarities of larger molecules. I used a model here based on oriented bonding offsets, which define a local unit of a crystal lattice. These are bonding locations which are made available to incoming particles, based on oriented offsets from the anchored particle. The anchored particle stores a mat4 with its position and orientation, so we can compute these offset locations on-the-fly, based on this mat4 on the nearest particle and some simulation parameters that define offsets about the origin. In the future I'm going to explore some specific forms in a more structured way, I was quite excited to get the framework laid down, at this point. There is a sparse grid based on a threadsafe templated hashmap I pulled off github, which I have set up to do GLM integer 3-vector keys for std::shared_ptr's pointing to individual grid cell management structures - more on the particulars here later. There are some memory issues with this hashmap that I have not had a chance to analyze, it swells to many times the initial allocation even with a large initial pool - this becomes several gigabytes very quickly and I may need to implement a more efficient solution myself. This is the prime hindrance to operating at larger scale, besides the amount of compute time required.

Some clippings here from the internet and from flipping through a chemistry encyclopedia at the library the other day, which suggest this is a reasonable way to model the modular nature of a crystal lattice. Mention of something called "Bravais Lattices", I'll need to look into further. It expresses a family of regular repeating patterns related to crystallography, which is a subset of something more general called Space Groups, getting into some very interesting material related to definitions of symmetry.

For the experimentation so far, I limited myself to bonding offsets defined by scaled and rotated x, y, z basis vectors. Adjusting the angles between them significantly changes the behavior, and the relative scale of the offsets also makes significant changes. Including the positive and negative polarities of these basis vectors has a major impact on the simulation - I found that including only one polarity will create much more interestingly structured crystals, where including both polarities encouraged more diffuse and blobby accretion. I want to explore this in a more structured way, using the 14 families of Bravais lattices, and generate crystal samples to look at with information attached about their simulation.

Resource Setup

Hashmap

This is a concurrent hashmap I found on github. It works reasonably well, but again I'm going to have to do some investigation related to memory usage. It is the limiting factor large runs, and can pretty easily force you to spill into swap. I'm not sure what the actual memory usage is coming from, since the bump allocator does all of its allocation upfront and collision handling internal to the hashmap should not be doing things like this to the memory consumption. I will need to evaluate what is happening with this if I want to do much more with it.

GridCell Struct

This is the most heavily used data structure in the simulation, and a pointer to one is the "value" to the ivec3 "keys" in the hashmap. This is a clean representation of a sparse grid, which I've used before. The hashmap returns the value for a given key, and provides a .find() and .insert() interface that is sufficient for anchored particles to find out if a GridCell already exists in an area of space before trying to create one, so it has a way to add itself to the anchored model.

The GridCell contains a vector of pointers to mat4's. These mat4's represent the orientation and position of the particles contained within it. It also contains a std::shared_mutex, which allows us to do a std::unique_lock to add to the vector, or a std::shared_lock to add in an allocated mat4 from the bump allocator.

Bump Allocator

Difficult to talk about one piece of this without talking about the others. This is a pattern I think I'm rather fond of - the high level idea is that you preallocate a big block of memory from the OS, and you do your own allocation layer by atomically incrementing an index into it. Each time you increment this pointer, you have a new memory location made available to the caller.

This sounds like basically like a hat on a hat. You want to allocate memory, you don't really want to think about it. You'd probably do fine just getting that memory from the OS at runtime. But there is actually a major, major benefit to doing it this way - besides static memory footprint, you are actually "recording" the time order of the allocations. Because the simulation happens as a series of billions of particle updates, when one of those updates makes a request to the bump allocator, it increments the offset into the buffer.

Effectively what this is doing, because that memory is also still linearly accessible out of that block you allocated initially, is recording the sim playback. You have the mat4 of the anchored particle, which expresses its position and orientation - it's the same memory you passed to the corresponding particle update when it was requested. You can pull all this out at the end and without even touching the hashmap you have a complete record of all anchored particles. This is slick.

Algorithm

Simulation Parameters

There are quite a few little knobs for the simulation. The most heavily used are probably the bonding offsets, like I described before. There are a couple others:

Temperature
Particles jitter around the simulation field using a normal distribution. This is to model brownian motion, and the temperature is included as a scale factor for this computed displacement.
Defect Rate and Defect Magnitude
When the crystals are added to the hashmap, they have a configurable chance to randomize their orientation, and jitter their position. This introduces defects into the crystal lattice, which can create some very interesting behavior. I split that into a chance ot randomize, and two scale factors that controlled the magnitude of rotational and positional defects.
A "Wind" term
This provides a global movement for all the particles in the sim. I found that it created interesting artifacts if it was added as a constant displacement each update, and that it makes more sense to do it with a randomized scale factor, to break up the regular marching pattern, especially when the magnitude of this displacement is much larger than the "thermal" displacement of the brownian motion. I also experimented with adding a per-thread dynamic wind term, that would be kept with retained state as a static variable (thread_local) and rotated slightly each time the particle update function got called. Less clear success with this approach than the single, global vector.
An Out-of-Bounds Attrition Lifetime
To avoid considering updates on particles that can't contribute meaningfully to the sim, minimum and maximum bounds of the simulation volume are maintained. These are updated atomically during the anchoring of any particle, and inform the simulation's outer bounds. This helps define where new particles will spawn into the simulation after they have bonded, and also helps with another mitigation scheme. The particles are stateful, obviously, with their position - another term is included in the fourth term of a vec4, this is the amount of "health" it has. Each update that it spends outside of the currently established bounds, this number is decremented. When it hits zero, from some configurable initial value, the particle respawns. It is if it had bonded to a surface, but resets its position without adding anything to the hashmap.

Simulation On Susan's Job System

I went into a lot more particulars on the Susan writeup. This project was built on top of that job system, and used it to run the particle update and render functions on each thread. Susan has a large number of hardware threads of execution, and this enables a significant degree of parallelism. A lot of work went into learning the C++ utilities related to threads, mutexes, and atomics, and you can read more about it on that page. Most of these simulation runs go to about 20 million points, and take anywhere from a couple minutes to overnight, depending on the specific simulation parameters.

Rendering

I took a kind of interesting approach to rendering the animations for this project. I used an off-the-shelf single-header gif encoder and mostly reencoded the gifs to mp4's with ffmpeg to share. In the future, I'd like to link to the ffmpeg libraries directly and avoid having to produce large intermediate files. GIFs output from this header served as a surprisingly suitable format for high-color video, I was surprised.

Splat-based Renderer

The first iteration of the renderer involved splatting all the finished sim points onto the screen, with no occlusion testing of any kind. This worked relatively well, it was not fast by any stretch of the imagination, running on the CPU and computing so many millions of point transforms. It creates an interesting ghostly effect and accentuates some parts of the crystals that were difficult to observe in the volumetric renderings.

It's an interesting aesthetic that might be well suited to an early 2000s nu-metal video. The flickering and camera jitter come from some attempts at autoframing and autobrightness, without any smoothing. I think it's got an interesting feeling to it, I liked some of the outputs.

Delta Tracking Volumetric Renderer

Delta tracking is an algorithm that Wrighter shared with me recently, it's one of the more modern approaches to volume rendering now. It is a stochastic approach that yields an unbiased raymarched result of a nonuniform volume, over many samples. The individual samples are very cheap, surprisingly so. You can see how quickly the animation frames for the videos complete, this is 512 samples per pixel at 720p on a laptop Nvidia 5070ti:

The volume is prepared by splatting all the points into a screen-dimensioned 3D texture buffer, with a z axis of 128 texels. The points are loaded into an SSBO, and the first N are displayed based on a uniform inpug N. Increasing N over time increases the number of particles that are shown, out of the whole list that have been anchored. This enables the playback because of the bump allocation scheme I described. You can see a couple frames advance in the video above.

The delta tracking algorithm is a straightforward extension to constant-step raymarching through a volume:


bool trace ( inout vec3 p, vec3 rd ) {
	float dist = 0.0f;
	float mv_dens = 40.0f; // max volume density
	float sd = -1.0f;
	for ( float i = 0.0f; i < 50.0f; i++ ) {
		float t = -log( rand() ) / mv_dens; // delta tracking step, based on expected max density

		t = max( t, sd ); // speed up empty space if this volume is contained inside an SDF (optional)
		sd = de( p );

		// take the step
		dist += t;
		p += rd * t;

		if ( getDensity( p ) > rand() ) {
			// the ray has scattered
			return true;
		}
	}
	return false;
}

Though I have now seen another slightly different formulation in talking to VasyaPupkin from the GP discord. He keeps a sum and checks against a threshold so the sampling becomes then somewhat stateful - the threshold that the sum is compared to is random, and is computed once per march:


bool RayMarching ( Ray ray, VolumeDesc desc, inout CRNG rng ) {
	Intersection intersect = IntersectAABB( ray, desc.BoundingBox );

	[branch]
	if ( intersect.Max < intersect.Min )
		return false;

	const float minT = max( intersect.Min, ray.Min );
	const float maxT = min( intersect.Max, ray.Max );

	const float threshold = -log( Rand( rng ) ) / desc.DensityScale;

	float sum = 0.0f;
	float t = minT + Rand( rng ) * desc.StepSize;
	float3 position = float3( 0.0f, 0.0f, 0.0f );

	[loop]
	while ( sum < threshold ) {
		position = ray.Origin + t * ray.Direction;
		[branch]
		if ( t >= maxT )
			return false;
		sum += desc.DensityScale * GetOpacity( desc, position ) * desc.StepSize;
		t += desc.StepSize;
	}
	return true;
}

I've got to try this, and compare results. I've also had some success with DDA traversals, and a more rudimentary, expensive sampling scheme in my recent pathtraced Physarum volumes. I'll need to do a side-by-side comparison of the three methods and look at the relative cost and performance of each method soon.

When you resolve a "hit" - that is, a random number check passes a threshold with the sampled density - the same trace function is called to see if you scatter on the way to a light source. With a low, uniform density in the surrounding space, these primary scattering events are relatively rare. Denser volumes, where the density is closer to a threshold value, will scatter more frequently. Over time, it resolves a very nice result. The most severe artifacts come from Z-axis aliasing in the volumes, where the effective infinite frequency delta-function point writes cannot be effectively sampled into the voxel grid, without multiple jittered offset samples. I've had limited success mitigating this issue, you can see it very prominently in the videos when there should be a flat, planar segment... it has pretty severe, stairstepped aliasing.

Discussion

This was an interesting exercise in using two different machines, for different things that they excel at. Susan has 128 gigs of ram, and 36 CPU cores, 72 threads of execution - Marsha, my laptop, has a much smaller CPU and much less memory, 32 gigs is typical for a consumer system. What Marsha does have, however, is a wicked fast discrete GPU, relative to anything that Susan has. This was the motivation for designing an intermediate format which I could use to pass between machines, to simulate on Susan and render on Marsha.

This format, specifically was a PNG image - kind of abused as a compressed linear data store. The semantics on image load/save APIs kind of encourage bad behavior, on this front... I've used it here to represent a big long list of mat4's which encode the particle locations and orientations. Because a mat4 is a 4x4 grid of floats, we can linearize this into 16 floats... each float is 4 bytes, enough for 1 pixel in a standard PNG image. With a multiple of 16 for the width, I can just use the height and width of the image to determine how many are present in the compressed store. Trimming any zeros off while saving, so that there is no bad data at the end, is a good idea.

The individual files are 300 megs to ~1 gig, because this level of entropy, the bits of a bunch of floats, is very high. It creates many unique colors, which is an adversarial condition for the PNG format. It was functional for my purposes, but likely could be improved upon - I kept full mat4's in the format, in case I wanted to load them back into another sim and resume, it should be very straightforward to do that. For rendering in the style seen on this page, you only really need the point location.

Future Directions

I think the biggest one I want to investigate is more interesting bonding behaviors. Introducing charged particles/particle type and bond polarity/affinity would allow you to select from a different set of bonding offsets, for different particle types. I think this would create some really interesting emergent behavior, as you build up more exotic structures made up of varying concentrations of different particle types.

You could base these specific offsets on some of the real particle-particle bonds that take place in the genuine article, in real atomic crystal structures. This is something that's been studied with X-ray diffraction techniques, and they actually have measurements of the relative scales and angles of the bonds involved. Crystallography is a wild field, and seems to be verging on magic at times. It's not hard to understand why there is such a culture around the perceived magical properties of crystals, the most modern iteration of which I suspect may have started in the 70s with the rise of crystal oscillators. Gems have of course drawn the attention of humans for the entire time we've existed on the planet.

Wrighter's code for delta tracking included an SDF march term, for skipping negative space. I'd like to investigate maybe creating a distance texture for the volume, using the jump flood algorithm. This does require a compensation, however, for the steps made in the negative space. If you do not, there will be no chance to scatter. It needs to be compensated by step size, which is sort of along the lines of what VasyaPupkin's code is doing. I'll need to think about the particulars here, but I think it may be possible to compute a chance to scatter based on that step size and still take the bigger steps, anyway.

I'd also like to do something to visualize the crystals in a more raytracing-specific manner. I was very disappointed with how tricky the aliasing problem ended up being on the volumetric renderer I put together. I think I might have another idea about how to approach it, the idea is basically to go from delta function point write, to something that suggests a distribution, and will be resilient to grid sampling. Many writes, with their amounts scaled by their distance to the center... maybe. Point is, I think I can take the list of points, remove any which are numerically identical or within some epsilon of one another, and create a BVH for millions of tiny glass spheres, the exact same way that I did using TinyBVH in Icarus.

Jon Baker, Graphics Programming