[Vertices Merging] check vertices array for duplicates

S_P_S · Jun 1, 2016

Hello,

I am currectly parsing 3D geometry data at runtime and it's actually working really well expect for one thing.

When I import large meshes, the loading time is huge because of the vertex merging I do.
I have a face class which stores information about the current face I am parsing.
When creating that face I am checking if any of the vertices used for that face is equals to the vertices I have already in my mesh. If thats the case I will check if the uv and normal is also the same and if it is I will use this vertex instead of a new one.

But checking the whole vertices array for thousands of vertices takes super long (~4 minutes compared to ~7 seconds whithout vertices merging)

This approach is super slow, there has to be a faster way, but I have no idea.

Maybe someone can give me an advice.

best regards

lordofduct · Jun 1, 2016

Optimizing code depends on how you wrote the code. We don't know how you wrote it, so we couldn't point out places that could be optimized.

Is there a reason this needs to be done at runtime? Can you not bake this at compile time? Is this like a realtime edit tool for the user to use?

What benefits are you getting from the vertex merging?

JoeStrout · Jun 1, 2016

_Beyond_ said: ↑

But checking the whole vertices array for thousands of vertices takes super long (~4 minutes compared to ~7 seconds whithout vertices merging)

This approach is super slow, there has to be a faster way, but I have no idea.
Click to expand...

You should use a Dictionary for this purpose. I've done the same thing in several projects; it works. The only caveat is that, in principle, you could get two Vector3's which are equivalent and yet don't hash to the some thing, and so fail to match up. But in practice, all your Vector3's here come from a file and go through the same parsing steps, so it is almost certainly not going to be a problem.

S_P_S · Jun 1, 2016

Thanks for your reply!

Here is my code:

Code (CSharp):

public int Merge(Vector3 vertex, Vector2 uv, Vector3 normal) {

if (vertices.Contains(vertex)) {

for (int i = 0; i < vertices.Count; i++) {

if (vertices[i].Equals(vertex)) {

if (uvs[i].Equals(uv) && normals[i].Equals(normal)) {

return i;

}

else {

continue;

}

}

else {

continue;

}

}

return -1;

}

else {

return -1;

}

}

I am searching a specific vertex in a vertices array. If I find an equal vertex I check if the uv and normal are also equal.

But this is very inefficient for large meshes, because I have to go through all vertices of the mesh, for all my vertices I want to parse.

I need a better approach for this vertex merging, but a better algorithm won't come to my mind.

S_P_S · Jun 1, 2016

Hey Joe,

I hope you are well!

Thanks for your advice, I will look into it.

best regards

lordofduct · Jun 1, 2016

Code (csharp):

public int Merge(Vector3 vertex, Vector2 uv, Vector3 normal) {

if (vertices.Contains(vertex)) {

for (int i = 0; i < vertices.Count; i++) {

if (vertices[i].Equals(vertex)) {

if (uvs[i].Equals(uv) && normals[i].Equals(normal)) {

return i;

}

else {

continue;

}

}

else {

continue;

}

}

return -1;

}

else {

return -1;

}

}

You're doing a double search in this code, you first check if it contains, then you loop through.

Thing is, a contains function on what appears to be either an array/list loops through each entry and compares. You're effectively doing this twice by saying "if loop through and test equality finds result, then loop through and test for equality".

This also appears to only be for testing a single vertex, I'm assuming there's other code for testing ALL vertices, which can probably be optimized as well.

Furthermore, the name of this function is weird... this function doesn't merge at all. It merely finds a vertex that matches the supplied vertex.

So lets just shorten up this bad boy here a little bit:

Code (csharp):

public int FindMatchingVertex(Vector3 vertex, Vector2 uv, Vector3 normal)

{

for(int i = 0; i < vertices.Count; i++)

{

if(vertices[i].Equals(vertext) && uvs[i].Equals(uv) && normals[i].Equals(normal))

return i;

}

return -1;

}

Of course, this could be probably sped up much much more with hash table for lookup (a dictionary could do this like JoeStrout suggested), but I don't know your entire source code for merging, so I can't optmize it well without seeing what exactly you need.

lordofduct · Jun 1, 2016

Here's a very basic lookup table based merge tool... it only has implementation for checking if a vert exists currently, but you could build from here:

Code (csharp):

public class MergeTool

{

#region Fields

private Dictionary<Vertex, int> _table = new Dictionary<Vertex, int>(VertexComparer.Default)

#endregion

#region CONSTRUCTOR

public MergeTool(Vector3[] verts, Vector2[] uvs, Vector3[] normals)

{

for(int i = 0; i < verts.Length; i++)

{

_table[new Vertex(verts[i], uvs[i], normals[i])] = i;

}

}

#endregion

#region Methods

public int ContainsVertex(Vector3 vertex, Vector2 uv, Vector3 normal)

{

var v = new Vertex(vertex, uv, normal);

int i = 0;

if(_table.TryGetValue(v, out i))

return i;

else

return -1;

}

#endregion

#region Special Types

public struct Vertex

{

public Vector3 vert;

public Vector2 uv;

public Vector3 normal;

public Vertext(Vector3 vert, Vector2 uv, Vector3 normal)

{

this.vert = vert;

this.uv = uv;

this.normal = normal;

}

}

private class VertexComparer : IEqualityComparer<Vertext>

{

private static readonly VertexComparer Default = new VertexComparer();

public bool Equals(Vertex a, Vertex b)

{

return a.vert.Equals(b.vert) && a.uv.Equals(b.uv) && a.normal.Equals(b.uv);

}

public int GetHashCode(Vertex v)

{

return v.vert.GetHashCode() ^ v.uv.GetHashCode() ^ v.normal.GetHashCode();

}

}

#endregion

}

Note I write a custom VertexComparer to handle the equality testing rather than overriding the object methods, I just prefer this route.

S_P_S · Jun 2, 2016

Thank you very much for your great help!

My posted code or your improved code make almost no difference. Both were super slow for me.

But using dictionaries worked really well (4 seconds loading time comapred to 5 minutes).

But where is the exact difference between dictionaries and lists?
In both approaches I searched the list/dictionary for a certain object. But the time they need is so different.

best regards and thanks to both of you!

lordofduct · Jun 2, 2016

My posted code or your improved code make almost no difference. Both were super slow for me.
Click to expand...

Yeah, I didn't expect it to be that much faster. Because the entire approach was naive. I merely cleaned up what was otherwise sloppy code.

But using dictionaries worked really well (4 seconds loading time comapred to 5 minutes).

But where is the exact difference between dictionaries and lists?
In both approaches I searched the list/dictionary for a certain object. But the time they need is so different.
Click to expand...

lists are an ordered sequence of elements.

Dictionary and HashSet use hashed sets. Sets are an un-ordered collection.

Basically imagine you have an array of length N, where N is a prime number near the size of your collection. In the case of a dictionary, the 'key' is what gets hashed (in my second example code above it's the Vertex struct). That has 'GetHashCode' called on it (a method all objects have in .net) or GetHashCode of an IEqualityComparer supplied to the collection (for custom hashing).

This hash value is just an integer that is associated with that object. It doesn't have to be unique, but it does have to always be returned for that object. If I have a string "ABC" and its hashcode happens to be 0xFF1299, than it should always be that (on that system).

Now you take that hash value and you modulo it around N. This gives you your index within the array of length N.

If 2 objects collide then you put them in what is called a 'bucket'. Basically its a short linklist of items within that index of the array. Thing is, there should be very few collisions of hashcodes (it's why N should be prime).

As a result, lookups are super fast. You don't have to traverse the entire collection trying to find it. Instead you generate the index it is at based on its hashcode, and at worst you get a linklist of like 3 entries to have to compare against. As opposed to the thousands you would have if you traversed the entire collection.

Note, in my example code above, I used Dictionary where the vertex was a key, and the value was the index in the original collection.

BUT, if that index isn't needed, and you're just shoving garbage in the 'value' part of the dictionary. You could use a 'HashSet<T>' instead:
https://msdn.microsoft.com/en-us/library/bb359438(v=vs.110).aspx

It's basically the 'key' part of a dictionary, without the 'value' part. It takes up less memory that way.

This can be perceived as a bag... it's unordered, and just contains stuff. You can easily put stuff in the bag, confirm if the bag contains anything, but when you go to loop over the stuff in the bag... you're just grabbing stuff out of the bag one at a time. You don't know what order it'll come out. This is great when the order doesn't matter, you just need to hold stuff.

Here you can find a full breakdown of hash tables (a dictionary):
https://en.wikipedia.org/wiki/Hash_table

Honestly, the best way to think about it is like a filing cabinet.

When you have a filing cabinet, you sort by some hash so as to make finding files in it faster. The simplest method might be grouping by 'first letter of surname'. So if you need to look up 'John Smith', you'd go to the 'S' drawer. Now you don't have to search the ENTIRE filing cabinet, only the 'S' drawer.

The more specific your hash, the shorter the search. If you grouped by first 3 characters of last name, you'd go to your 'Smi' drawer. But of course, this more specific would only work as your filing cabinet got larger. Having a single drawer for 'Smi' when you have only 15 files is ridiculous. But when you had 1 million files, it makes way more sense.

The only difference between this and the hash in .net/mono, is that there we turn it into a integer rather than some specialized string/word... because computers prefer numbers to words, where as humans prefer words to numbers. But that's fine, like integers are simple, their hash is themselves, where as a string might be a XOR shift of all of their characters in Ascii. A short simple algorithm to calculate a number... just like finding the human hash of 'John Smith' is fast... we can look at the value and determine the hash lickity split.

Search Unity

Unity ID

Useful Searches

[Vertices Merging] check vertices array for duplicates