Download posts/comments from PL with this program!

enemyofthestate1776 Tue, 12/22/2015 - 19:13

Hi all,
I am putting this on hold for now, as it has been claimed that my program took down the site.

1. I am incredibly sorry to all of you, Michael and Jon included, if this is indeed the case.

2. I have reasons to believe that it is not the case, as I explain in the comment below (posted under My View).

Be 2) as it may, if I am wrong about this, my apology stands.

OK - My View

I had hope that this program would be a god-send to those who wished to save their posts (like Kathleen Gee, who simply wished to get her posts so she could re-post them on her blog.) 

However, Jon is blaming my program for the downing of PL, and I feel the need to communicate my view here. First I would like to state that I have never had a problem with Jon in any way, and found him to be one of the best mods on the site over the years.

First, I would like to point out that there were, in total, 10 downloads of my program (the total is actually 13 at the time of this message, but 3 of them were me, testing on other computers). 

The way the software worked was that you would login, then type in the username of the user whose posts you wanted to download.

The software would then issue 1 webpage request to PL, for the users profile page.

If the page existed (and thus the user existed also), then the program would send 1 more webpage request to PL, asking for their user/posts page. 

It would then use the returned data to detect how many pages of posts existed for that user, and then place, on a request queue, a request for each page of posts that existed. 24 pages of posts, 24 requests. On a queue, though. Only 6 were able to run at one time. This was not my decision, although I was comfortable with it, as I didn't want the site to go down due to my program flooding the server, similar to DDOS. Wow, if this is true, I've discovered a way to take down websites with only 10 copies of a program..... Hmmmm.

The real reason for this limitation to 6 requests at a time is that I used the application api Qt to build my software. Qt, I assume to avoid crashing websites via DDOS attacks, or too many simultaneous HTTP requests, has a hard limit (6) on how many HTTP requests can simultaneously run. The rest are queued.

"Note: QNetworkAccessManager queues the requests it receives. The number of requests executed in parallel is dependent on the protocol. Currently, for the HTTP protocol on desktop platforms, 6 requests are executed in parallel for one host/port combination." - http://doc.qt.io/qt-5/qnetworkaccessmanager.html#details

(Now, there is of course a way around that, but I didn't do that. (IE I used only one QNetworkAccessManager. So, please, anyone who codes, my code is still there at https://github.com/team2e16/PostRetrievePL. Download it, check it, and either tell me I'm wrong, or...)

After it retrieved each page (6 at a time, and always waiting for the Website to reply before issuing new requests), it would use the returned webpage data to find a link to each post on said page, and make a list of each post's links, to later download. I was so worried about crashing the site that I purposely avoided immediately downloading each post, and instead made a list of links, as I wanted the previous requests to be finished with by the PL website, before I issued more.

To that end, at each stage, I had the next part of the software running a 5 second timer, which would then check if all requests had been replied to by the website. If they had, the program would continue to the next stage. If they hadn't, the program would wait another 5 seconds for the requests to finish, and check again.

The next stage sent the webpage requests from the posts list to the website, again six at a time, and again waiting for the website to reply. If there were 24 pages of posts, the total number of requests would be roughly 240 (10 posts per page). But again, it's not 240 at once. It's 6, then wait for the reply (just as you do in your browser), then 6, then wait, then 6, then wait. That's why it took me about 15 minutes to download Emalvini's entire posts/comments. Because the program doesn't, and in fact can't, generate more than 6 concurrent http requests.

Then the program would move on to comments.

a) request the user's comments page and check the total pages of comments

b) The comments were stored directly on each comments page, so if you had 10 pages of comments, then my program only made 10 more requests. The highest number I saw when downloading, was 100 pages approx, so about 1000 comments. But only 100 http requests.

So practical example. 

I downloaded Emalvini's entire posting and commenting history. 1 http request for his profile page, 1 for his posts page, and around 34 requests for his posts pages, 340 requests for his actual posts, and about 100 requests for his 1000 or so comments (10 per page). 

So that's 476 requests. It took around 15 minutes, from memory. I went away and made coffee, talked to my kids, and came back. 15 minutes is 900 seconds.

476 requests in 900 seconds is: 0.53 requests per second. So about 1 request every two seconds.

The main reason for this is not that my program couldn't send requests faster (at least in blocks of 6). The massive limitation on how many requests I could make, was how fast the PL website replied, because only 6 requests would be issued, and it would take around between 0.5-2 seconds to receive a reply from PL.

Not let's, for the sake of argument, assume that my program was issuing requests three times as fast as this. IE that it took only 5 minutes to download Emalvini's stuff.

So now there would be 1.59 http requests per second. Scary stuff, indeed. Pretty sure I can beat that by splitting chrome into two separate windows and clicking refresh every second.....

And now add in the other 10 copies of the program that people downloaded.

Let's assume:

1) They all got it to work (doubtful, because shortly before the site went down, I posted links to an updated version of the software for Windows 7, because it wasn't working for Windows 7 users)

2) They were on their computers from the moment they got the software until the moment of the crash, constantly feeding in new usernames to download posts/comments from, without any breaks whatsoever.

3) They also managed 1.6 requests per second.

And we now have a combined total of 17.6 requests per second. Very, very, worst case scenario. What was it that Jon said in his email to Michael?

"Multiply this by a few people and there are hundreds of heavy requests a second, causing all issues being logged on the server."

If hundreds of people had downloaded my software, then yes. This could indeed be the case. But only 10 did......

Finally, if we assume that each error message is around 300 bytes long, then at the calculated rate, 11 users including myself going continuously, could rack up around 5KB per second of error messages (or 18MB per hour, or 432MB per day, or around 0.9GB of error message between the time my program was released, and the site went down, assuming that every single http request was logged as an error. I don't see how that could happen, unless my program wasn't actually working, which it was, and I have 40 users posts and comments to prove it. Total size of all of these posts and comments? 30MB...

Second Issue:

Jon 'quoted' me in his email. This is what he sent to Michael:

'They even stated "you're going to have to wait, this slows things down..."'
This is completely untrue. I have a copy of my original post (thanks to my program lol). This is what I said.

"Press the 'Gather Posts and Comments' button. Patience is required here. The program now sifts through all of the users posts and comments pages, and then downloads and extracts the information from each one."

Now, I'll give Jon the benefit of the doubt and assume he just is stressed, fixing a server on Xmas eve, and didn't get what I meant. 

I didn't mean that the site would slow down. I meant that it takes time to download all of the relevant pages from PL, actually because instead of requesting every single page at once, my program grabs them 6 at a time, and then has to wait for PL's delay in reply.

However:

One thing that is possible is that, when I was working on my program, I may have been generating many error messages during this process. Also, after I finished my first program, I was rushing out a program that could download entire threads (not by user, but by downloading the post and all conversation in the comments below, and stitch it all into one monolithic HTML page).

It is possible that this is the case; that my testing of my software has caused the problem. But again, I can't see any conceivable way, whether via ten users plus me, or in testing, that my software generated 'hundreds of requests per second'. 

So in order for this to be true, 1) the available hard drive space must have been already very low, and 2) the software (Drupal, in the case of PL) must have not sent a warning email to the responsible parties to warn of impending problems (hard drive space nearly full, lots of errors).

My theory:

I assume that I am correct on the above, and haven't overlooked something. (Possible, but I don't think so)

I assume that there is no funny business going on with Michael/Jon. I want to make this clear. I don't think they're pulling the plug early for some unknown reason.

I assume that, given the site is to be shut down within 1 more week, that no further hard drive space was to be supplied to the site, and that Michael may have requested the site be 'drawn down' slowly, to save costs, or whatever, making it possible for my program (and perhaps the influx of members we haven't seen in quite some time (some names I've never seen) - after the shutdown announcement) to push it over the brink.

Again, if my software or testing of the software caused the site to go down, and I am solely responsible for downing a well-equipped (sufficient hard drive space) and much-loved website prematurely at the cusp of its imminent shutdown, I am sincerely sorry to Michael, and Jon, and the entire DP/PL community.

What is the category of this post? (choose up to 2): 
enemyofthestate1776's picture
About the author
"I know not what course others may take; but as for me, give me liberty or give me death!" - Patrick Henry
Bitcoin SEO Technologies's picture

to help deal with a problem created by the DP founder. 

Several of us noticed that the error on the failing database indicated an out of memory problem on the server. I grabbed a screenshot of it. One can fix that.

So the rumors are spread that it wasn't Michael's hand that shut down PopularLiberty. I rather think enemyofthestate1776 should consider himself privileged to get the very last heaping dollop of Nystrom Drama™.

Out with the old, on with the new.

BitcoinSEOTechnologies provides professional SEO services for growing the bitcoin business community.

Libera_me's picture

for both Popular Liberty and Daily Paul. Thank you so much.

Speak up for those who cannot speak up for themselves, for the rights of all who are destitute. Speak up and judge fairly, defend the rights of the poor and the needy.~~ Prov.30: 8 & 9

Bitcoin SEO Technologies's picture

And I have some good news.

About the Google search index: "Web sites or web pages which are not accessible remain in the index for up to a month. This is done to prevent a temporary outage from impacting a web site's position in the Search index."

Which means we may have up to an extra month to gather those gems. You just have to deal with the entire lack of formatting.

BitcoinSEOTechnologies provides professional SEO services for growing the bitcoin business community.

Bitcoin SEO Technologies's picture

There is still a cached copy of most PopularLiberty pages on Google itself. Try putting this exact string into the Google search box.

site:popularliberty.com "by Kathleen Gee"

She's posted around 1,990 times with either a post or comment. 

You can add a space and more terms after the username to search for particular posts.

site:popularliberty.com "by Kathleen Gee" help importing my posts

Or take out the "by" to get even more pages with posts OR comments or any mention of that username. Sometime it has been mentioned in the sidebar under Recent Comments and it shows up on this search. 

Next to each URL is a dropdown to get to the cached snapshot of that page. I don't know how long the cached copy stays around for a site that's gone entirely 404, but it's there for now. It's not pretty, as it looks like the CSS style sheet isn't loading, but it's there, it's all there.

BitcoinSEOTechnologies provides professional SEO services for growing the bitcoin business community.

Liberty Pastor's picture

but only got a listing of who posted when and how many replies there were, who made the last reply and when they made it.

"...where the Spirit of the Lord is, there is liberty." 2 Corinthians 3:17

deacon's picture

But because the site is down,I got the error message as trying to open the site by using .y bookmarked pages.All I was able to do was find it using google

.

Bitcoin SEO Technologies's picture

you must not click the usual link, but only the dropdown menu at the end of the URL. There you'll find the cache snapshot.

BitcoinSEOTechnologies provides professional SEO services for growing the bitcoin business community.

Ron Aldof's picture

Been posting this site address on FB pro liberty groups walls hoping to draw in badly needed new blood so we can expand. The block on new people at the PL might have been a preview to it shutting down.

Nothing will grow if we just keep talking to each other.

#IStandWithRandAndLiberty!
Let's get back to Liberty and Freedom.
Trump is working with the Establishment.

tony m's picture

Hey, if the your program is the culprit, its not like it was a negligent thing.

Seems this is the new DP. Have to get used to it.

.

deacon's picture

caused a massive drain of resources,then wouldn't just stopping its use,put the PL back to where it was before it started?

 d

.

enemyofthestate1776's picture

is that the hard drive was filled up by errors, which, as far as I know, don't get automatically deleted; they have to be reviewed by an admin (like Jon), and then removed. 

So:

- If my program generated hundreds of errors per second (I'm 99% certain it didn't), and 

If the hard drive was already mostly full, and

If the error logs were place on the same hard drive as the website or operating system (bad practice)

then when the drive became full, definitely the site could go down.

"I know not what course others may take; but as for me, give me liberty or give me death!" - Patrick Henry

deacon's picture

of what you are saying,I am not a computer geek,wizard or anything of the sort,I was trying to download stillwaters posts,and my own post about the police state into my computer,it came out looking strange,crap was scrambled,in place of words were characters,nothing made sense. I had this idea,maybe google cache would have them,you know seeing it was already in cyber space,I could find the url's,but seeing the PL is down,i got nothing,,,Now,why did I just type all that?   :)

  d

.

Bitcoin SEO Technologies's picture

http://webcache.googleusercontent.com/search?q=cache:c8rxA1mBvQcJ:popularliberty.com/2731/police-state-pictures+&cd=1&hl=en&ct=clnk&gl=us

Pictures are still intact. The page doesn't render correctly as the style sheet is no longer available. 

I've included the method in another post in this thread. Just be sure you're viewing the cache, not the regular link.

All stillwater's posts:

https://www.google.com/search?num=30&q="by+stillwater"+site%3Apopularliberty.com

Just be sure not to click the regular link, but use the tiny pull-down menu at the end of the URLs to get to the cached copy.

BitcoinSEOTechnologies provides professional SEO services for growing the bitcoin business community.

deacon's picture

Excellent,thanks for the teaching,I kinda figured it was something I was doing wrong,thank you for grabbing my post.

 I will look for your explain from another comment you made about this topic.

 d

.

pawnstorm12's picture

I did see Michael's message just now.

I felt a little at ease to know what caused the issue.

Maybe he can work it out so that we can get back on there long enough to archive some of our content.

Merry Christmas to you Enemy of the State.  Perhaps many ex DP'ers have a new home here.

This is the best I've come across.

 

 

"We have allowed our nation to be over-taxed and over-regulated and overrun by bureaucrats - the founders would be ashamed." -Ron Paul

Bitcoin SEO Technologies's picture

See my post below.

BitcoinSEOTechnologies provides professional SEO services for growing the bitcoin business community.

enemyofthestate1776's picture

.

"I know not what course others may take; but as for me, give me liberty or give me death!" - Patrick Henry

crabacado's picture

I hope so

A man who chops his own wood is warmed by it twice

crabacado's picture

two people here...one whom likes to debate and one who is defending his position. Reminds me of Douglas/Lincoln

 

I like it. This is healthy bantor. I'll be around

A man who chops his own wood is warmed by it twice

enemyofthestate1776's picture

Which is which? Pretty sure I'm both :) haha

"I know not what course others may take; but as for me, give me liberty or give me death!" - Patrick Henry

Pages