data:image/s3,"s3://crabby-images/dba1c/dba1c322047112c576d12a6498f52734b3925690" alt=""
data:image/s3,"s3://crabby-images/35320/35320d84fde350ec3b1455006b452902b92898ba" alt=""
The past few weeks I have been messing around on the site rentacoder.com. Most of my work at Sandia lately has been writing (documents in English), with some IT work / configuration thrown in. This site seems to be a cool way to make some extra cash ($20 so far, only one success) and have fun writing programs.
One potential customer wanted someone to manually go to a bunch of web pages and harvest contact information. I started an email conversation with the guy, mentioning that I think there was a better way to automate this. Currently, I have a quick prototype I put together using urllib and regular expressions in python. If he picks up the project, I think I can find/create a better regular expression for email and clean up the data. Right now, I wanted to mess with writing some sort of email harvester; I just thought it would be fun (I have no aspirations towards becoming a spammer).
The code takes a list of fully qualified URLS, one per line. Here is the list the potential customer gave me.
http://weprintbarcodes.com
http://accstation.com
http://escan3d.com
http://edealsdepot.com
http://sandboxthreads.com
http://wildlifewonders.com
http://foreverbamboo.com
http://topsecretautomaticmoney.com
http://armormount.com
http://myjones.com
Here are the results after running my program:
brian@ubuntu-bind:~/tmp/other_programs/rent_a_coder/web_grabber$ time ./grabber.py
customerservice@weprintbarcodes.com
href="mailto:feedback@edealsdepot.com">Contact
freebies@sandboxthreads.com
src="https://p10.secure.hostingprod.com/@sandboxthreads.com/ssl/ecomby_128bit2.gif"
src="https://p10.secure.hostingprod.com/@sandboxthreads.com/ssl/paypal.gif"
Sculpture","http://ep.yimg.com/ip/I/wildlifegifts_2055_31879747","795","-@NULL@-");var
href="mailto:info@wildlifewonders.com">info@wildlifewonders.com
real 0m9.527s
user 0m0.312s
sys 0m0.208s
brian@ubuntu-bind:~/tmp/other_programs/rent_a_coder/web_grabber$
Not too great, but pretty good for about an hour and a couple of questions to my friend Aaron, who is awesome at Python. If I ever seriously want to write an email scraper (either for myself or a customer), I'll get a better regular expression, clean the output up, make it multithreaded and dump the email addresses to a database.
I may or may not ever actually post the code to this one, depending on how the rentacoder.com sales go. If you would like to see the code, leave me a comment with how to get in touch with you.