Thursday, November 28, 2013

Ask A Programmer - Part 2

I received two more questions from a Fiverr gig I have called "Answer A Programming Questions In Written English."

1. What kind of coding languages/programming/software would typically sit behind online 'article spinners? Some examples of article spinners:

       http://www.spinrewriter.com
       http://thebestspinner.com

Before I researched this question, I was unfamiliar with the term "article spinner." After researching that term, I realized that the purpose of article spinning is to generate multiple copies of an article, all of which are different enough to convince a search engine that they are not the same - which results in having 'more content' for a search engine to index. This is a way to game the search engine, and actually contributes nothing of value, but is a strategy to get a higher search engine ranking (because you have 'more content.')

Despite the fact that I try to create high quality, interesting, funny, articles, and think spinning is total crap, there are still numerous technical aspects to it that are pretty cool. 

I believe that the underlying mechanism that spinners use is found in probability, specifically random variables. A random variable represents a number of outcomes, each with a probability. For an article spinner, it would make sense that synonyms can be represented with a random variable, and the 'closer' the synonym, the higher the probability of it's replacement. For example, we could have the word "car" as a random variable associated with many terms. This can be expressed as

car = {(auto, 0.98), (automobile, 0.95), (truck, 0.80), (ford, 0.75), (chevy, 0.70) ... (boat, 0.001)}

Where the first value in the tuple () is the term we want to substitute, and the second value is the associated probability (which has to be between 0-1.) So, substituting "auto" for "car" can work in 98% of cases. For our spinner to work well, we need to have a LARGE number of terms we can substitute on (random variables) and a LARGE number of HIGH PROBABILITY substitutions to make. Think of the article spinner playing mad-libs, but with high probability substitutions. The quality of the spinner is going to depend on the quality of the synonym lists.

OK, so we have our term substitution outlined above, what does our software actually conceptually look line? Something like this: 


And then you hit the "Go!" button, and the software makes a number of articles based on your input article (with how many depending on how far you cranked the "number of spins" knob), and those articles match somewhere between 0 (with this being a mad-lib, where the grammar of the language holds, but not much else) and 1 (where the articles perfectly make sense, but use different words that mean EXACTLY the same thing.) The output is a series of articles. "Max" is going to be the maximum articles that can be generated by counting the permutations on the synonyms found in the spinner and original article. Any spinner that guarantees 'infinite' spins is either:
  1. Ignorant of combinatorics
  2. Lying
  3. Performing very, very, very, low probability/acceptability substitutions on already spun articles. This would result in a computer playing the game 'telephone' where the message is continuously distorted (which, mathematically speaks, still results in a finite number of articles, but would be large enough for all practical purposes.) 
This describes the conceptual framework for how I imagine an article spinner works. Compared to the dudes I hang out with, I suck at math. Most of the guys I spend time with are super, duper, smart and good at programming, computer science, and math. I have a BS and MS in Computer Science, which is sort of like getting an undergraduate minor in math, which isn't very much math. I'm all right, but please excuse me if my probability terms aren't perfect. I'm not sure about the second knob and using an actual random variable - I'm describing something similar, but not exactly like a random variable. 

I imagine that these spinners are subscriptions to websites. Which means that a large portion of their implementation is web programming : HTML, CSS, and JavaScript. However, the algorithm I described above would make sense to implement server-side, so that someone cannot just look at your JavaScript code and have access to your spinner logic. 

With regard to programming langues, I'd like to implement this in a programming language called "Python" since I'm pretty good at writing programs in Python. Since it would need to be part of a web page, I'd probably take a look at a framework called "Django" which is not to be confused with probably the first Quentin Tarantino movie I ever liked, called "Django Unchained." Basically though, any kind of server-side language would work. Php is super popular as far as server-side languages go. One of my friends uses Java as part of the Tomcat framework for server-side work. This is all based on the idea that someone is selling a web-based subscription to a spinner, as used through a web page. If someone were selling a binary (.EXE on Windows) then any language which compiles into an .EXE could be used - like C++, or any compiled language.

2. What languages/coding/software would be required if I wanted to add extra features like bring up url lists of top 10 youtube videos, or newspaper articles, or Flickr photos - all based on the main article keyword. Eg. An article with a keyword of "good running shoes" would bring up links for good shoe videos and pictures and online articles from reputable news sources (to get good quotes and statistics).

I actually built something like this for scraping yellowpages.com. My scraper would accept a term and a zip code, and then grab all the results and put the results in a database. For any of these things, you need a 'scraper' which is a piece of software that crawls through a target website, and collects information. For the most recent scraper I wrote, we only cared about yellowpages.com. I would put the results into a MySQL database, so that different programs and/or web services could use the results from my scrapes.

Since Youtube, Google News, and Flickr all have different interfaces (ways of accepting input, and displaying results) you'd need a scraper for each one. Scrapers may exist, but I haven't had very good luck in finding existing scrapers, so I always write my own. You may be able to take advantage of RSS feeds if you were interested in very general top results, but for results for a specific term, I'm pretty sure you'd need a scraper.

I love using Python and BeautifulSoup for scraping. Putting the results into a database makes a lot of sense usually, since then any other program can use the power of a database to access the scraping results.

Things have been super, super busy with my software company, but I like this Fiverr gig. It gives me interesting questions to answer (for money) and helps me build my blog content (without article spinning!)

Wednesday, November 6, 2013

417.5 Princeton = Available

Back in July, I wrote a blog post about the apartment I was living in being available for rent. This was 417 "B" Princeton. One major reason I moved out of my apartment, and significantly downgraded my standard of living, was the knowledge that my tenant in 417.5 Princeton might move out at any time. The tenant at 417.5 Princeton has rented from me for three years, performs beautiful landscaping, is almost never here, does basic repairs, always pays on time, and is basically the best possible tenant anyone could ask for. As a result of all these positive factors, I agreed to let him go month-to-month which is something I have never done before due to the uncertainty it introduces into my rental business. He has decided to cancel his lease at the end of November, so this posting shows the apartment I have available.

The apartment is a one-bedroom located on the second story. It rents for $560 / month. I pay for water, sewage and garbage. The new tenant would need to pay for electricity and gas. Internet access is available to share for an extra $20 / month. No pets.



 As you approach the apartment, you can see the newly rebuilt slat wood fence. When I purchased this property, about half the fence was chain link. Now there is no chain link fence anywhere on my property.

I always liked second floors - there are never any roaches, and it's nice looking out over everyone. 


 From outside the unit, you can see a clothesline...
 ...and my hot tub.


 The balcony / entrance walks into the kitchen.

 The kitchen sink was recently replaced (by me) with a top of the line Kohler sink.


Past the kitchen is the living room. 




The living room connects to a hallway, which leads to the bedroom and the bathroom. 



 I like all the windows in the bedroom.


 The apartment has ample closet space.



In the hallway, between the bedroom and the bathroom, there is a shelf with a lot of storage space.




 The outside of the apartment looks like a garden, because of the previous tenant.

Basically, all of my money is invested into this property, and I'm trying to have tenants that will help that investment grow, as opposed to destroy it. Until August, I lived on the property, and I plan on living on the property (in a different apartment) again very soon.

Thanks to my girlfriend Lyric Hammonds for helping me steam clean the carpets, do touch up painting, and take such nice pictures!