Search Unity

Extract HTML Table from Web Page source

Discussion in 'Scripting' started by RoyalCoder, Jan 29, 2018.

  1. RoyalCoder

    RoyalCoder

    Joined:
    Oct 4, 2013
    Posts:
    301
    Hi my friends,

    I build a script to read & download a specific webpage source in a text file in Unity, what I really want to achieve is to extract from this pages only html tables data, for example (the red text to be removed, the green one to keep and extract data):

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html lang="en" xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" >
    <head>
    <link rel="shortcut icon" href="https://mywebsite.com/favicon.ico" />
    <title>My Website Com</title>
    <meta name="description" content="Ministerul pentru intreprinderi mici si mijlocii, comert, turism si profesii liberale"/>
    <meta name="keywords" content="My Web Site/>
    <meta name="Language" content="en"/>
    <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
    <meta name="rating" content="General" />
    <meta name="revisit-after" content="7 Days" />
    <meta name="robots" content="index,follow" />
    <link rel="shortcut icon" href="/favicon.ico" />
    <meta name="publisher" content="Unity Design" />
    <meta name="copyright" content="Copyright (c) Unity Design" />
    <meta name="author" content="Developed by Unity Design - www.UnityDesign.com" />
    <link href="/css/style.css?t=2017061401" rel="stylesheet" type="text/css" />
    <link href="/css/uploader.css?t=2017061401" rel="stylesheet" type="text/css" />
    <script type="text/javascript" src="/js/jquery-1.8.0.min.js?t=2017061401"></script>
    <script>
    var jQr = jQuery.noConflict();
    </script>
    <script type="text/javascript" src="/js/mootools-1.2.5-core-yc.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/mootools-1.2.5.1-more.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/uploader/Swiff.Uploader.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/uploader/Fx.ProgressBar.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/uploader/Lang.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/uploader/FancyUpload2.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/js.js?t=2017061401"></script>
    <script src='https://www.google.com/recaptcha/api.js?hl=en'></script>
    </head>
    <body onload="$('ajaxloader').setStyle('display','none')"><div id="container">

    <div class="logo_container">
    <a href="/" id="logo" title="MWC - Home Page"><img src="/i/logo.png?40084" /></a>

    <div style="position:absolute; right:0; top:107px;" id="ajaxloader"><img src="/i/ajax-loader.gif" /></div>
    </div>
    <div class="menu_top">
    <a href="https://mywebsite.com/" title="Home Page"><h2>Home Page</h2></a>
    <a href="https://mywebsite.com/contact/" title="Contact"><h2>Contact</h2></a>
    <div class="clear"></div>
    </div>
    <div style="clear:both;"></div>
    <div style="padding:5px 0;"></div>

    <div id="content" ><h1>List of items: Example</h1><br><br>

    <div class="tableExample" style="padding-left:0;">
    <table class="formular">

    <tr>
    <th>Position</th>
    <th>Name of item</th>
    <th>Date added</th>
    </tr>
    <tr>
    <td>1</td>
    <td>John</td>
    <td>2017-07-14 19:19</td>
    </tr>
    <tr>
    <td>2</td>
    <td>Jane</td>
    <td>2017-07-14 19:30</td>
    </tr>
    <tr>
    <td>3</td>
    <td>Kelly</td>
    <td>2017-07-14 18:44</td>
    </tr>
    <tr>
    <td>4</td>
    <td>Michael</td>
    <td>2017-07-12 12:49</td>
    </tr>
    <tr>
    <td>5</td>
    <td>William</td>
    <td>2017-07-13 00:26</td>
    </tr>
    </table>
    </div>
    </div><script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-100774771-1', 'auto');
    ga('send', 'pageview');

    </script>
    </body>
    </html>

    Any ideas how can achieve this?
    Thanks in advance!
     
  2. Zonlib

    Zonlib

    Joined:
    Apr 15, 2014
    Posts:
    39
    You just build a script to read html page as an xml document and get the node named 'table'.
     
    Last edited: Jan 29, 2018
    RoyalCoder likes this.
  3. johne5

    johne5

    Joined:
    Dec 4, 2011
    Posts:
    1,133
    RoyalCoder likes this.
  4. Brathnann

    Brathnann

    Joined:
    Aug 12, 2014
    Posts:
    7,188
    RoyalCoder likes this.
  5. pandigital

    pandigital

    Joined:
    Mar 20, 2009
    Posts:
    15
    hi - can anyone point me to (or provide) a guide for getting AngleSharp working with Unity ?
     
  6. Brathnann

    Brathnann

    Joined:
    Aug 12, 2014
    Posts:
    7,188