Search Unity

  1. Calling all beginners! Join the FPS Beginners Mods Challenge until December 13.
    Dismiss Notice
  2. It's Cyber Week at the Asset Store!
    Dismiss Notice

Extract HTML Table from Web Page source

Discussion in 'Scripting' started by InfinityCoder88, Jan 29, 2018.

  1. InfinityCoder88

    InfinityCoder88

    Joined:
    Oct 4, 2013
    Posts:
    245
    Hi my friends,

    I build a script to read & download a specific webpage source in a text file in Unity, what I really want to achieve is to extract from this pages only html tables data, for example (the red text to be removed, the green one to keep and extract data):

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html lang="en" xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" >
    <head>
    <link rel="shortcut icon" href="https://mywebsite.com/favicon.ico" />
    <title>My Website Com</title>
    <meta name="description" content="Ministerul pentru intreprinderi mici si mijlocii, comert, turism si profesii liberale"/>
    <meta name="keywords" content="My Web Site/>
    <meta name="Language" content="en"/>
    <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
    <meta name="rating" content="General" />
    <meta name="revisit-after" content="7 Days" />
    <meta name="robots" content="index,follow" />
    <link rel="shortcut icon" href="/favicon.ico" />
    <meta name="publisher" content="Unity Design" />
    <meta name="copyright" content="Copyright (c) Unity Design" />
    <meta name="author" content="Developed by Unity Design - www.UnityDesign.com" />
    <link href="/css/style.css?t=2017061401" rel="stylesheet" type="text/css" />
    <link href="/css/uploader.css?t=2017061401" rel="stylesheet" type="text/css" />
    <script type="text/javascript" src="/js/jquery-1.8.0.min.js?t=2017061401"></script>
    <script>
    var jQr = jQuery.noConflict();
    </script>
    <script type="text/javascript" src="/js/mootools-1.2.5-core-yc.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/mootools-1.2.5.1-more.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/uploader/Swiff.Uploader.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/uploader/Fx.ProgressBar.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/uploader/Lang.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/uploader/FancyUpload2.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/js.js?t=2017061401"></script>
    <script src='https://www.google.com/recaptcha/api.js?hl=en'></script>
    </head>
    <body onload="$('ajaxloader').setStyle('display','none')"><div id="container">

    <div class="logo_container">
    <a href="/" id="logo" title="MWC - Home Page"><img src="/i/logo.png?40084" /></a>

    <div style="position:absolute; right:0; top:107px;" id="ajaxloader"><img src="/i/ajax-loader.gif" /></div>
    </div>
    <div class="menu_top">
    <a href="https://mywebsite.com/" title="Home Page"><h2>Home Page</h2></a>
    <a href="https://mywebsite.com/contact/" title="Contact"><h2>Contact</h2></a>
    <div class="clear"></div>
    </div>
    <div style="clear:both;"></div>
    <div style="padding:5px 0;"></div>

    <div id="content" ><h1>List of items: Example</h1><br><br>

    <div class="tableExample" style="padding-left:0;">
    <table class="formular">

    <tr>
    <th>Position</th>
    <th>Name of item</th>
    <th>Date added</th>
    </tr>
    <tr>
    <td>1</td>
    <td>John</td>
    <td>2017-07-14 19:19</td>
    </tr>
    <tr>
    <td>2</td>
    <td>Jane</td>
    <td>2017-07-14 19:30</td>
    </tr>
    <tr>
    <td>3</td>
    <td>Kelly</td>
    <td>2017-07-14 18:44</td>
    </tr>
    <tr>
    <td>4</td>
    <td>Michael</td>
    <td>2017-07-12 12:49</td>
    </tr>
    <tr>
    <td>5</td>
    <td>William</td>
    <td>2017-07-13 00:26</td>
    </tr>
    </table>
    </div>
    </div><script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-100774771-1', 'auto');
    ga('send', 'pageview');

    </script>
    </body>
    </html>

    Any ideas how can achieve this?
    Thanks in advance!
     
  2. Zonlib

    Zonlib

    Joined:
    Apr 15, 2014
    Posts:
    33
    You just build a script to read html page as an xml document and get the node named 'table'.
     
    Last edited: Jan 29, 2018
    InfinityCoder88 likes this.
  3. johne5

    johne5

    Joined:
    Dec 4, 2011
    Posts:
    1,091
    InfinityCoder88 likes this.
  4. Brathnann

    Brathnann

    Joined:
    Aug 12, 2014
    Posts:
    4,750
    InfinityCoder88 likes this.
  5. pandigital

    pandigital

    Joined:
    Mar 20, 2009
    Posts:
    15
    hi - can anyone point me to (or provide) a guide for getting AngleSharp working with Unity ?
     
  6. Brathnann

    Brathnann

    Joined:
    Aug 12, 2014
    Posts:
    4,750