Search Unity

  1. We are migrating the Unity Forums to Unity Discussions by the end of July. Read our announcement for more information and let us know if you have any questions.
    Dismiss Notice
  2. Dismiss Notice

Question Couldn’t get equations in html when convert word .docx file to html file in C#.

Discussion in 'Testing & Automation' started by unity_CDD173EC6B926743E372, Apr 24, 2024.

Thread Status:
Not open for further replies.
  1. unity_CDD173EC6B926743E372

    unity_CDD173EC6B926743E372

    Joined:
    Apr 24, 2024
    Posts:
    1
    I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.

    At the time of converting from word file to html my equations which are in the word document file was convert into image.


    Code (CSharp):
    1. Globals.ThisAddIn.Application.ActiveDocument.Select();
    2.  
    3. Microsoft.Office.Interop.Word.Document doc = Globals.ThisAddIn.Application.ActiveDocument;
    4.  
    5.  
    6. string result = Path.GetTempPath();
    7.  
    8.  
    9. string tmpFileName = Globals.ThisAddIn.Application.ActiveDocument.FullName;
    10.  
    11. doc.SaveEncoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUSASCII;
    12.  
    13. if (File.Exists(result + "temp.html"))
    14.  
    15. {
    16.  
    17.     File.Delete(result + "temp.html");
    18.  
    19. }
    20.  
    21. doc.SaveAs(result + "temp.html", WdSaveFormat.wdFormatFilteredHTML);
    22.  
    23.  
    24. doc.Close(Microsoft.Office.Interop.Word.WdSaveOptions.wdDoNotSaveChanges);
    25.  
    26.  
    27. HtmlAgilityPack.HtmlDocument mangledHTML = new HtmlAgilityPack.HtmlDocument();
    28.  
    29. mangledHTML.Load(result + "temp.html");
    30.  
    31.  
    32.  
    33. if (File.Exists(result + "newtemp.html"))
    34.  
    35. {
    36.  
    37.     File.Delete(result + "newtemp.html");
    38.  
    39. }
    40.  
    41.  
    42. mangledHTML.Save(result + "newtemp.html");
    43.  
    44. // Remove standalone CRLF
    45.  
    46.  
    47. string badHTML = File.ReadAllText(result + "newtemp.html");
    48.  
    49. badHTML = badHTML.Replace("\r\n\r\n", "ackThbbtt ");
    50.  
    51. badHTML = badHTML.Replace("\r\n", " ");
    52.  
    53. badHTML = badHTML.Replace("ackThbbtt ", "\r\n");
    54.  
    55. badHTML = badHTML.Replace('�', ' ');
    56.  
    57. if (File.Exists(result + "finaltemp.html"))
    58.  
    59. {
    60.  
    61.     File.Delete(result + "finaltemp.html");
    62.  
    63. }
    64.  
    65. File.WriteAllText(result + "finaltemp.html", badHTML);
    66.  
    67.  
    68. // Clean up temp files, show the finished result in Notepad
    69.  
    70. File.Delete(result + "temp.html");
    71.  
    72. File.Delete(result + "newtemp.html");
    73.  
    74.  
    75. Microsoft.Office.Interop.Word.Document orignalDoc = new Document();
    76.  
    77. orignalDoc = Globals.ThisAddIn.Application.Documents.Open(tmpFileName);

    Basically, what I want to do is I want to store all word document paragraph data separately in database and I also want it’s all property like font size, font width, font name and font style. So that I can show it in my application as it is as I written in word document file.

    To represent it as it is I need to convert it html format and the by sepreting all paragraphs I can store it in database. But when in my word document has paragraph which have equations then


    Code (CSharp):
    1. Globals.ThisAddIn.Application.ActiveDocument.Select();
    2. Microsoft.Office.Interop.Word.Document doc = Globals.ThisAddIn.Application.ActiveDocument;
    3.  
    4. string result = Path.GetTempPath();
    5.  
    6. string tmpFileName = Globals.ThisAddIn.Application.ActiveDocument.FullName;
    7. doc.SaveEncoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUSASCII;
    8.  

    This code converts my word documents all equations in Images and as it convert in image I can’t show the equation properly in my application.

    So I tried to convert this equations in MATHML form but I couldn’t solve this.
     
  2. superpig

    superpig

    Drink more water! Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,667
    This isn't a Unity question. I suggest asking on a general programming site, such as StackOverflow.
     
Thread Status:
Not open for further replies.