|
Title: [Python] Parsing text from a webpage Post by: jakx on January 27, 2009, 06:58:55 PM I am trying to write a program that takes a web page and basically finds all the strings that a user puts in and returns them to standard output. As the topic says i am doing this with python because i want to learn it as well as write this program. I have looked for documentation on functions like findall and search but could not find any good documentation on them.Here is what i have so far. Any suggestions would be great. Thanks.
Code: import urllib, sre, re, sys, print "Enter The website: " url = raw_input() data = urllib.urlopen(url).read() print "Please enter a topic for me to find: " topic = raw_input() matches = re.findall(data, topic) print matches Title: Re: [Python] Parsing text from a webpage Post by: adamj on January 28, 2009, 12:02:30 AM I'm new to Python too, but how about this?
Same as yours, but it should strip out HTML tags. import urllib2, sre, re, sys, string def remove_html_tags(data): p = re.compile(r'<[^<]*?>') return p.sub('', data) print "Enter The website: " url = raw_input() response = urllib2.urlopen(url) data = remove_html_tags(response.read()) print "search word" topic = raw_input() matches = re.findall(topic, data) print matches Title: Re: [Python] Parsing text from a webpage Post by: jakx on January 30, 2009, 10:45:45 AM Awesome! Thanks for the input!
Title: Re: [Python] Parsing text from a webpage Post by: munkeyfreenix .batcat on March 11, 2009, 06:49:50 PM One thing to keep in mind about your script regarding secure practices is how you use raw_input.
in my experience, raw_input() is way better than just input(), but you MUST run checks on it and scrub the data. otherwise your program will be buggy at best and most likely insecurely coded. Title: Re: [Python] Parsing text from a webpage Post by: geo on March 14, 2009, 05:44:31 AM I think you should rather rely on a HTML parser or on XPath. I wrote an article last month about web scraping techniques : http://ssscripting.wordpress.com/2009/02/15/web-scraping-techniques/ . Even though the code samples are written in ruby, you can use beautifulsoup to do the same type of scraping.
Powered by SMF 1.1.18 |
SMF © 2013, Simple Machines
Joomla Bridge by JoomlaHacks.com |