Search Unity

Unity's version control component has been upgraded to Plastic SCM.

Collaborate doesn't work with accent marks and other special characters

Discussion in 'Unity Collaborate' started by TX128, Jan 30, 2019.

  1. TX128

    TX128

    Joined:
    Aug 22, 2018
    Posts:
    25
    Hi,

    I'm the lead programmer on my project and this problem is driving me kind of crazy. Unity handles most files with accent marks and other kind of special characters like commas just fine, but Collaborate does not. I'm not even going to try to describe the problem throughly because it is incredibly obvious, simply add some files with these chracters to your project and try to download from scrath the project on another machine (or on the same machine after removing thee project). These files appear as modified. When your try to change their name from unity it hangs up for minutes or crashes.

    These kinds of characters on files should be avoided, and I do fix them and never upload them. However the art team doesn't seem to get that, and has recently uploaded a folder with a comma in it, which forced to change it from the explorer manager and that triggered me having to re-upload 1.6GB of files to the server because Collaborate couldn't figure out that was a name change.

    Honestly, Colllaborate is really broken and someone should fix it. I can deal with random disconnects, with 1-2 minutes of startup because for whatever reason it appears to be doing a linear search on the files while hanging the user, I can deal with me having to download the patch I just uploaded from time to time. But this situation with the accent marks is just too much.
     
  2. Marc-Saubion

    Marc-Saubion

    Joined:
    Jul 6, 2011
    Posts:
    655
    Well, that explains a lot.

    I've had this issue on one computer for month with collaborate keeping marking some assets as changed. I now realize they were assets with special characters.

    I'm trying to change it and already lost half a day of work on that because collaborate handle it so well that it get stuck on errors or simply crash the whole editor. I now have to revert all my corrections on specials characters because collaborate doesn't work at all anymore.

    Brilliant! As usual with collaborate...
     
  3. TX128

    TX128

    Joined:
    Aug 22, 2018
    Posts:
    25
    Thankfully my assets where from the art team so changing it from outside unity didn't break anything, but for others that is a huge pain... this should be fixed ASAP.
     
    Marc-Saubion likes this.
  4. Marc-Saubion

    Marc-Saubion

    Joined:
    Jul 6, 2011
    Posts:
    655
    Considering Collaborate currently have all its features broken, that won't be fixed.
     
    TX128 likes this.
  5. TX128

    TX128

    Joined:
    Aug 22, 2018
    Posts:
    25
    We're currently paying for it, because otherwise we can't work in parallel. Is there any other system? Github for Unity was really broken last time I tried.
     
  6. Marc-Saubion

    Marc-Saubion

    Joined:
    Jul 6, 2011
    Posts:
    655
    Same here, I manage a +60GB project and can't move it on a new VCS without making sure it can handle it.

    I tried Github these last days, not "github for Unity" but "github desktop". It works fine and we even managed to merge a scene which was cool but Github's policy towards big files is very unclear. They tell you the storage is unlimited but you need to use git LFS for large files. If you do you need to pay extra, I didn't and it works anyway o_O. You need to configure it manually through command prompt and have no way to know if you did things correctly or not.

    So even if it works a my 5GB guinea pig project, I have to leave Github because it could stop working any time and is very user unfriendly for any non programmer.

    This week I think I'm going to try Plastic SCM as Playmaker's team recommended following collaborate's issues. They are officially handling large files, have an optional friendly UI for artist and if you look in your Unity settings, they are a pre-configured VCS which is a good sign.



    They do have a one month free trial so you should check it out.

    I'm crossing my fingers because Collaborate became so bad that my company is bleeding money. I can't wait to be done with it.
     
  7. Ryan-Unity

    Ryan-Unity

    Joined:
    Mar 23, 2016
    Posts:
    1,993
    Hi @TX128! Do you happen to know if your artists are typing these accent marks into their filenames or if they're copy-pasting them in from an ISO Latin 1 encoded document? Because of how UTF-8 works, byte encoding with french characters in ISO Latin 1 is considered an illegal encoding, which could explain why you're running into these issues.
     
  8. TX128

    TX128

    Joined:
    Aug 22, 2018
    Posts:
    25
    Hi @ryanc-unity
    1- He types them up
    2- It shouldn't matter (?), should it?
    3- All the characters I'm talking about are part of ISO Latin 1 (ISO/IEC 8859-1).
    4- Comma is part of ASCII

    I've had problems with:
    002C -> Comma
    00C0 - 00EF -> Characters with accent marks, I can pin point them more specifically if you need.
     
  9. Gurg

    Gurg

    Unity Technologies

    Joined:
    Nov 9, 2016
    Posts:
    73
    So this is interesting. Part of Unity's asset import pipeline is supposed to be responsible for doing the conversion to UTF-8, as best as I know. Windows uses UTF-16 under the hood which would still be fine. We've isolated the problem in this discussion to the character set added by ISO Latin 1. This should only be a problem if some given text is still encoded in ISO Latin 1 by the time it gets to collab. How is it bypassing the UTF-8 conversion step in the pipeline? (The pipeline assumption is why Collab shouldn't have to care about the character encoding.)

    Are those the actual byte values of your text you are posting or just the results of the lookup of the problem characters in the ISO Latin 1 table? The true byte values here are important.

    For example, if we use À and looked up the true byte values, this would tell us what it is encoded in. A byte value of 0xC0 would mean it is encoded in ISO Latin 1 still. A byte value of 0xC380 would mean that it is UTF-8 encoded. Collab is assuming UTF-8 encoding for all operations, I believe. If you could find out what the byte values are for the affected strings, this would aid in solving the issue.
     
    Cambesa likes this.
  10. Gurg

    Gurg

    Unity Technologies

    Joined:
    Nov 9, 2016
    Posts:
    73
    For some further background and fun experiments with character encoding to understand the possible issue, you can try the following.

    1. Open Notepad++
    2. Set the character encoding to ISO Latin 1
    3. Type a string with the À character
    4. Change the character encoding to UTF-8 without converting the byte values of the text already in place
    5. Observe how the string is now displayed
    You should see that the character that was an À is no longer properly displayed. This is because in ISO Latin 1, À is a 1 byte character that starts with a leading 1 bit. In UTF-8 there is no such thing as a single byte character with a leading 1 bit. A leading 1 bit means it is a multi-byte character in UTF-8. How the parsing error is handled is dependent on the UTF-8 parser implementation. Some parsers are intelligent enough to realize it is an error and skip the character. Others will take the next character byte afterwards trying to interpret it as UTF-8, even when it isn't, which can result in more than just the À getting mangled.

    Sometimes I question if unicode was a mistake.
     
  11. TX128

    TX128

    Joined:
    Aug 22, 2018
    Posts:
    25
    I can try that if you want, but I think I failed to communciate the most important part here: I'm talking about file names, not file content.

    edit: the characters I talked about were based on looking on the table, not on 'empirical' data
     
  12. Gurg

    Gurg

    Unity Technologies

    Joined:
    Nov 9, 2016
    Posts:
    73
    Yeah, that would help explain why it doesn't go through the UTF-8 converter in the pipeline. What Operating System are you both using and what version of Unity?
     
  13. Gurg

    Gurg

    Unity Technologies

    Joined:
    Nov 9, 2016
    Posts:
    73
    I just tried my Notepad++ trick I mentioned above on Windows 10 with Unity 2018.3.0f2. The result I got was that the operating system itself dropped the Á from the file name. That said, I had synergy running in the background and I am questioning if it was modifying my clipboard as a side effect. I'll run some more tests.
     
  14. TX128

    TX128

    Joined:
    Aug 22, 2018
    Posts:
    25
    We're both running Windows and Unity, before going to sleep I'll run some tests, but isn't it weird that it also fails with a comma? It is standard ASCII after all.
     
  15. TX128

    TX128

    Joined:
    Aug 22, 2018
    Posts:
    25
    Weirdly enough I can't seem to be able to reproduce this with the comma now. But it has certainly happened with a comma on a folder, and on rename it crashed unity.

    Is Collaborate written in C# or C++?
     
  16. Gurg

    Gurg

    Unity Technologies

    Joined:
    Nov 9, 2016
    Posts:
    73
    That is a wonderful question and the short answer is a combo of C++, C#, and JavaScript but is moving towards being mostly C#. Apologies in advance for delayed replies, the snow here is being a distraction.
     
  17. Gurg

    Gurg

    Unity Technologies

    Joined:
    Nov 9, 2016
    Posts:
    73
    Okay, longer answer time.

    Like I said, it's a combo of C++, C#, and JavaScript. However, the JavaScript is only used for the UI of the toolbar window (that window that pops up over everything when you click the collab button that sometimes has the green checkmark). The Collab History window used to also be JavaScript but has since been converted to using the newer UI Elements framework and C#. Work is underway to do the same for the toolbar window so we can drop the JavaScript portion of the codebase entirely.

    The C++ section is mostly there for a combination of legacy and internal API related reasons. However, moving forward, even this is shrinking. We've moving away from using the C++ based legacy proprietary snapshot system that tried to track the current state in a flat file stored at <project folder>/Library/Collab/<generated_snapshot_name>.txt and instead moving to using a system with C# based LibGit2 tracked in a git style logging system stored locally in a .collab folder (like a .git folder). You can get a preview of this if you using Unity 2018.3 and use the Collab 1.3.x preview package or use Unity 2019.1. We're also converting other C++ code to C# code for multiple reasons including having a larger percentage of Collab's logic be in the Collab package so it is free to be updated with bug fixes outside the normal editor release cycle. It wouldn't be surprising if in the not too distant future if Collab became 90% to 100% written in C#.
     
  18. TX128

    TX128

    Joined:
    Aug 22, 2018
    Posts:
    25
    I would actually bet money whatever is wrong is in some assumption part made in C++ when handling names of files. Is there any hope for Collaborate to become partially open source? Otherwise I think I can't do much else apart from reporting this issue as is :(
     
  19. Gurg

    Gurg

    Unity Technologies

    Joined:
    Nov 9, 2016
    Posts:
    73
    The C# part is actually available through our C# reference repo on Github.
    https://github.com/Unity-Technologi...e70c6b3a4e9b794e8abe7cf334/Editor/Mono/Collab

    The Collab logic should just be taking in whatever it's given. Or rather, how one of the other engineers puts it, "garbage in, garbage out". Hence why I was wondering about the actual binary value and encoding of the filename strings. That said, the scenario I mentioned should be a really rare edge case and if you are both using Windows and have the same version of Unity, then that should not be an scenario you would run into. (I think Windows uses UTF-16 under the hood for character encoding, so there should not be involvement with ISO Latin 1 in your described workflow.)

    Also maybe I am blind but I don't see in the thread any mention of which editor version you are using. Can I can the Editor version from you so I can make sure I am trying to reproduce this on the same version? Also I know you said you are using Windows, but just to make sure, you are both using Windows 10, correct?
     
  20. TX128

    TX128

    Joined:
    Aug 22, 2018
    Posts:
    25
    Correct, we are using windows 10 with latests updates. We just updated to Unity 2018.3.5f. Before that we were using 2018.3

    I'll check later the collab code. Because honestly the fact that it takes minutes to updates files with a few couple KB amazes me, encoding problems aside.
     
  21. TX128

    TX128

    Joined:
    Aug 22, 2018
    Posts:
    25
    Just in case you might want to sahre this problem with someone the steps to reproduce should be something as follows:
    -Setup a new project with collaborate etc
    -Create a file with an accent mark or some other non-ASCII character. Example: baño.cs
    -Download it on another computer.

    Expected result:
    The download finished alright and the computer where the changes have been dowloaded has no differences.

    Real result:
    The download finished alright and reports no problems, however the file baño.cs is recognized as either new file or changed, even though it clearly was not modified nor changed on the second machine.
     
  22. TX128

    TX128

    Joined:
    Aug 22, 2018
    Posts:
    25
    Example of the problem today:
     
  23. TX128

    TX128

    Joined:
    Aug 22, 2018
    Posts:
    25
    And when trying to fix it: