Since three days i ve been working with this project of fetching results from vtu website. This problem was floated by Girish Sir and is a frequent problem for all staff members after announcement of vtu results. I used BeautifulSoup and mechanize python modules for doing the parsing and connections respectively. Here is an overview of what has been the outcome.
The python mechanize module https://pypi.python.org/pypi/mechanize/ is extremely simple yet powerful tool to fetch data from websites.
Its interface is almost similar to how we interact with websites using a browser. The vtu results page has a form with name as "new" which houses an "input" tag called "rid". So we need to select that form and initialize the contents of that input tag to our usn. Then submit the form. Here is the code to do that.
by the way if there are multiple forms in the page br.forms() is the way to go.
The response is a new page with the result and is stored in the "resp" variable. We can now use BeautifulSoup to parse this page to fetch the results.
1) Get the html of the page with resp.get_data()
2) create a new BeautifulSoup object with this html
3) The result table is stored inside a "td" tag with width "513". Note that this should be unique and it is in this case. In fact there are many tables within this "td". Get all of them.
4) Result table is the second one so tables[1] is the result table. It has a number of rows and columns. Find them all and fetch the data in them and store in a list. The code is self explainatory.
Python ROCKS!!
The python mechanize module https://pypi.python.org/pypi/mechanize/ is extremely simple yet powerful tool to fetch data from websites.
Its interface is almost similar to how we interact with websites using a browser. The vtu results page has a form with name as "new" which houses an "input" tag called "rid". So we need to select that form and initialize the contents of that input tag to our usn. Then submit the form. Here is the code to do that.
by the way if there are multiple forms in the page br.forms() is the way to go.
The response is a new page with the result and is stored in the "resp" variable. We can now use BeautifulSoup to parse this page to fetch the results.
1) Get the html of the page with resp.get_data()
2) create a new BeautifulSoup object with this html
3) The result table is stored inside a "td" tag with width "513". Note that this should be unique and it is in this case. In fact there are many tables within this "td". Get all of them.
4) Result table is the second one so tables[1] is the result table. It has a number of rows and columns. Find them all and fetch the data in them and store in a list. The code is self explainatory.
Python ROCKS!!
3 comments:
That's very useful sir. Thank you.
Good one man... u are making "us" ,the so called "IT" professionals look amateur
thank you sir its very useful blog.
but can you inform the procedure to implement in website so its maybe still useful/by giving complete code
Post a Comment