1. What kind of coding languages/programming/software would typically sit behind online 'article spinners? Some examples of article spinners:
http://www.spinrewriter.com
http://thebestspinner.com
Before I researched this question, I was unfamiliar with the term "article spinner." After researching that term, I realized that the purpose of article spinning is to generate multiple copies of an article, all of which are different enough to convince a search engine that they are not the same - which results in having 'more content' for a search engine to index. This is a way to game the search engine, and actually contributes nothing of value, but is a strategy to get a higher search engine ranking (because you have 'more content.')
Despite the fact that I try to create high quality, interesting, funny, articles, and think spinning is total crap, there are still numerous technical aspects to it that are pretty cool.
I believe that the underlying mechanism that spinners use is found in probability, specifically random variables. A random variable represents a number of outcomes, each with a probability. For an article spinner, it would make sense that synonyms can be represented with a random variable, and the 'closer' the synonym, the higher the probability of it's replacement. For example, we could have the word "car" as a random variable associated with many terms. This can be expressed as
car = {(auto, 0.98), (automobile, 0.95), (truck, 0.80), (ford, 0.75), (chevy, 0.70) ... (boat, 0.001)}
Where the first value in the tuple () is the term we want to substitute, and the second value is the associated probability (which has to be between 0-1.) So, substituting "auto" for "car" can work in 98% of cases. For our spinner to work well, we need to have a LARGE number of terms we can substitute on (random variables) and a LARGE number of HIGH PROBABILITY substitutions to make. Think of the article spinner playing mad-libs, but with high probability substitutions. The quality of the spinner is going to depend on the quality of the synonym lists.
car = {(auto, 0.98), (automobile, 0.95), (truck, 0.80), (ford, 0.75), (chevy, 0.70) ... (boat, 0.001)}
Where the first value in the tuple () is the term we want to substitute, and the second value is the associated probability (which has to be between 0-1.) So, substituting "auto" for "car" can work in 98% of cases. For our spinner to work well, we need to have a LARGE number of terms we can substitute on (random variables) and a LARGE number of HIGH PROBABILITY substitutions to make. Think of the article spinner playing mad-libs, but with high probability substitutions. The quality of the spinner is going to depend on the quality of the synonym lists.
OK, so we have our term substitution outlined above, what does our software actually conceptually look line? Something like this:
And then you hit the "Go!" button, and the software makes a number of articles based on your input article (with how many depending on how far you cranked the "number of spins" knob), and those articles match somewhere between 0 (with this being a mad-lib, where the grammar of the language holds, but not much else) and 1 (where the articles perfectly make sense, but use different words that mean EXACTLY the same thing.) The output is a series of articles. "Max" is going to be the maximum articles that can be generated by counting the permutations on the synonyms found in the spinner and original article. Any spinner that guarantees 'infinite' spins is either:
- Ignorant of combinatorics
- Lying
- Performing very, very, very, low probability/acceptability substitutions on already spun articles. This would result in a computer playing the game 'telephone' where the message is continuously distorted (which, mathematically speaks, still results in a finite number of articles, but would be large enough for all practical purposes.)
This describes the conceptual framework for how I imagine an article spinner works. Compared to the dudes I hang out with, I suck at math. Most of the guys I spend time with are super, duper, smart and good at programming, computer science, and math. I have a BS and MS in Computer Science, which is sort of like getting an undergraduate minor in math, which isn't very much math. I'm all right, but please excuse me if my probability terms aren't perfect. I'm not sure about the second knob and using an actual random variable - I'm describing something similar, but not exactly like a random variable.
I imagine that these spinners are subscriptions to websites. Which means that a large portion of their implementation is web programming : HTML, CSS, and JavaScript. However, the algorithm I described above would make sense to implement server-side, so that someone cannot just look at your JavaScript code and have access to your spinner logic.
With regard to programming langues, I'd like to implement this in a programming language called "Python" since I'm pretty good at writing programs in Python. Since it would need to be part of a web page, I'd probably take a look at a framework called "Django" which is not to be confused with probably the first Quentin Tarantino movie I ever liked, called "Django Unchained." Basically though, any kind of server-side language would work. Php is super popular as far as server-side languages go. One of my friends uses Java as part of the Tomcat framework for server-side work. This is all based on the idea that someone is selling a web-based subscription to a spinner, as used through a web page. If someone were selling a binary (.EXE on Windows) then any language which compiles into an .EXE could be used - like C++, or any compiled language.
2. What languages/coding/software would be required if I wanted to add extra features like bring up url lists of top 10 youtube videos, or newspaper articles, or Flickr photos - all based on the main article keyword. Eg. An article with a keyword of "good running shoes" would bring up links for good shoe videos and pictures and online articles from reputable news sources (to get good quotes and statistics).
I actually built something like this for scraping yellowpages.com. My scraper would accept a term and a zip code, and then grab all the results and put the results in a database. For any of these things, you need a 'scraper' which is a piece of software that crawls through a target website, and collects information. For the most recent scraper I wrote, we only cared about yellowpages.com. I would put the results into a MySQL database, so that different programs and/or web services could use the results from my scrapes.
Since Youtube, Google News, and Flickr all have different interfaces (ways of accepting input, and displaying results) you'd need a scraper for each one. Scrapers may exist, but I haven't had very good luck in finding existing scrapers, so I always write my own. You may be able to take advantage of RSS feeds if you were interested in very general top results, but for results for a specific term, I'm pretty sure you'd need a scraper.
Since Youtube, Google News, and Flickr all have different interfaces (ways of accepting input, and displaying results) you'd need a scraper for each one. Scrapers may exist, but I haven't had very good luck in finding existing scrapers, so I always write my own. You may be able to take advantage of RSS feeds if you were interested in very general top results, but for results for a specific term, I'm pretty sure you'd need a scraper.
I love using Python and BeautifulSoup for scraping. Putting the results into a database makes a lot of sense usually, since then any other program can use the power of a database to access the scraping results.
Things have been super, super busy with my software company, but I like this Fiverr gig. It gives me interesting questions to answer (for money) and helps me build my blog content (without article spinning!)
No comments:
Post a Comment