Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question String / Text object : How to copy the 1st sentence of a paragraph at runtime

Discussion in 'Scripting' started by SI_007, Jan 23, 2022.

  1. SI_007

    SI_007

    Joined:
    Aug 10, 2015
    Posts:
    83
    Hi there!

    I am using the Wikipedia API in order to extract some text information based on user queries. It works, and it provides me with a string of text which includes the first full paragraph (ex: 4-5 lines of text).

    I would like to copy the 1st sentence within this paragraph in order to use it with a text to speech plugin afterwards.

    I haven't been able to find any info on how to manipulate the content of a string variable or the content of a text object at runtime in a c# script. Would anyone know how to accomplish such a task?

    Thanks!
    Pascal
     
  2. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    36,756
    You can use regular expressions to parse and tear apart strings.

    https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex?view=net-6.0

    Reliably chopping up any random chunk of text that you get from a source you don't control is mostly fraught with discovering ways that the text in question can be formatted to break your code, then redesigning your code to handle each exceptional circumstance.

    Some examples of exceptional circumstances might be:

    - there is no first sentence
    - the first sentence ends with something other than a period
    - there is a non-sentence-ending period somewhere in the first sentence (such as "I drove from Main St. to the beach.")
    - etc
     
    orionsyndrome likes this.
  3. SI_007

    SI_007

    Joined:
    Aug 10, 2015
    Posts:
    83
    Hi Kurt-Dekker,

    Thank you very much for your suggestion and reference on how it could be possible to achieve this objective of mine. Indeed, this approach does have an element of risk, however. I'll keep looking for a better option, but it is highly useful to know that this is probably the best that could be achieved from the perspective of working with the content of the string.

    Thanks again!
    Pascal
     
  4. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,043
    You should be querying for an HTML text, or a wiki markup. That way you get a structured text, so that you may extract the first paragraph, wrapped by a <p> tag (or in case of a wiki markup, it's the text until you hit the two successive end_of_line characters). Sadly, there is no way to accurately delimit only the first sentence or any amount of them, so you have to do it by brute-force and hope for the best.

    As per this MediaWiki extension
    It is troubled exactly in the way as described by Kurt-Dekker, and so I doubt there is a reliable way to do it.
     
  5. SI_007

    SI_007

    Joined:
    Aug 10, 2015
    Posts:
    83

    Thanks Orionsyndrome.

    Indeed, the reliability issue makes it difficult to opt for my initial approach to Wikipedia's extracted information (i.e., 1st sentence only). I will most likely opt for a different approach (ex: a generic TTS statement "Here's what I have found on Wikipedia") in order to circumvent this issue.

    Thanks again!
    Pascal
     
  6. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,043
    The best you can do is similar to what Wikipedia does for its own article previews: an excerpt.
    Simply copy the first N characters -or- a full paragraph, whichever has fewer characters. If the text is cut mid-sentence, append ellipsis (...). Imho that's good enough.
     
  7. adehm

    adehm

    Joined:
    May 3, 2017
    Posts:
    369
    Code (CSharp):
    1.         string paragraph = "This is a sentence that ends in a single period so it will be straight forward. You may need to find some special circumstances to make it more robust.";
    2.         char[] characterArray = new char[paragraph.Length];
    3.         for (int i = 0; i < paragraph.Length; i++)
    4.         {
    5.             characterArray[i] = paragraph[i];
    6.             if (characterArray[i] == '.') break;
    7.         }
    8.         string sentance = new string(characterArray);
    9.         Debug.Log(sentance);
     
    Last edited: Feb 4, 2022
  8. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,043
    That code works okay in general, but is not the best suited for this.

    Code (csharp):
    1. string ExtractSentence(string text) {
    2.   var index = text.IndexOf(". ");
    3.   if(index >= 0) return text.Substring(0, index);
    4.   return text;
    5. }
    I can't tell if OP needs such an example, but that's all that's needed.
     
  9. SI_007

    SI_007

    Joined:
    Aug 10, 2015
    Posts:
    83
    Thank you very much for the code examples, orionsyndrome and adehm. I will certainly test them out, even though I'm probably more inclined at this moment to change my original approach (which was to use the 1st sentence Wikipedia extract as a text to speech output).

    Thanks again!
    Pascal