Download posts/comments from PL with this program!

enemyofthestate1776 Tue, 12/22/2015 - 19:13

Hi all,
I am putting this on hold for now, as it has been claimed that my program took down the site.

1. I am incredibly sorry to all of you, Michael and Jon included, if this is indeed the case.

2. I have reasons to believe that it is not the case, as I explain in the comment below (posted under My View).

Be 2) as it may, if I am wrong about this, my apology stands.

OK - My View

I had hope that this program would be a god-send to those who wished to save their posts (like Kathleen Gee, who simply wished to get her posts so she could re-post them on her blog.) 

However, Jon is blaming my program for the downing of PL, and I feel the need to communicate my view here. First I would like to state that I have never had a problem with Jon in any way, and found him to be one of the best mods on the site over the years.

First, I would like to point out that there were, in total, 10 downloads of my program (the total is actually 13 at the time of this message, but 3 of them were me, testing on other computers). 

The way the software worked was that you would login, then type in the username of the user whose posts you wanted to download.

The software would then issue 1 webpage request to PL, for the users profile page.

If the page existed (and thus the user existed also), then the program would send 1 more webpage request to PL, asking for their user/posts page. 

It would then use the returned data to detect how many pages of posts existed for that user, and then place, on a request queue, a request for each page of posts that existed. 24 pages of posts, 24 requests. On a queue, though. Only 6 were able to run at one time. This was not my decision, although I was comfortable with it, as I didn't want the site to go down due to my program flooding the server, similar to DDOS. Wow, if this is true, I've discovered a way to take down websites with only 10 copies of a program..... Hmmmm.

The real reason for this limitation to 6 requests at a time is that I used the application api Qt to build my software. Qt, I assume to avoid crashing websites via DDOS attacks, or too many simultaneous HTTP requests, has a hard limit (6) on how many HTTP requests can simultaneously run. The rest are queued.

"Note: QNetworkAccessManager queues the requests it receives. The number of requests executed in parallel is dependent on the protocol. Currently, for the HTTP protocol on desktop platforms, 6 requests are executed in parallel for one host/port combination." - http://doc.qt.io/qt-5/qnetworkaccessmanager.html#details

(Now, there is of course a way around that, but I didn't do that. (IE I used only one QNetworkAccessManager. So, please, anyone who codes, my code is still there at https://github.com/team2e16/PostRetrievePL. Download it, check it, and either tell me I'm wrong, or...)

After it retrieved each page (6 at a time, and always waiting for the Website to reply before issuing new requests), it would use the returned webpage data to find a link to each post on said page, and make a list of each post's links, to later download. I was so worried about crashing the site that I purposely avoided immediately downloading each post, and instead made a list of links, as I wanted the previous requests to be finished with by the PL website, before I issued more.

To that end, at each stage, I had the next part of the software running a 5 second timer, which would then check if all requests had been replied to by the website. If they had, the program would continue to the next stage. If they hadn't, the program would wait another 5 seconds for the requests to finish, and check again.

The next stage sent the webpage requests from the posts list to the website, again six at a time, and again waiting for the website to reply. If there were 24 pages of posts, the total number of requests would be roughly 240 (10 posts per page). But again, it's not 240 at once. It's 6, then wait for the reply (just as you do in your browser), then 6, then wait, then 6, then wait. That's why it took me about 15 minutes to download Emalvini's entire posts/comments. Because the program doesn't, and in fact can't, generate more than 6 concurrent http requests.

Then the program would move on to comments.

a) request the user's comments page and check the total pages of comments

b) The comments were stored directly on each comments page, so if you had 10 pages of comments, then my program only made 10 more requests. The highest number I saw when downloading, was 100 pages approx, so about 1000 comments. But only 100 http requests.

So practical example. 

I downloaded Emalvini's entire posting and commenting history. 1 http request for his profile page, 1 for his posts page, and around 34 requests for his posts pages, 340 requests for his actual posts, and about 100 requests for his 1000 or so comments (10 per page). 

So that's 476 requests. It took around 15 minutes, from memory. I went away and made coffee, talked to my kids, and came back. 15 minutes is 900 seconds.

476 requests in 900 seconds is: 0.53 requests per second. So about 1 request every two seconds.

The main reason for this is not that my program couldn't send requests faster (at least in blocks of 6). The massive limitation on how many requests I could make, was how fast the PL website replied, because only 6 requests would be issued, and it would take around between 0.5-2 seconds to receive a reply from PL.

Not let's, for the sake of argument, assume that my program was issuing requests three times as fast as this. IE that it took only 5 minutes to download Emalvini's stuff.

So now there would be 1.59 http requests per second. Scary stuff, indeed. Pretty sure I can beat that by splitting chrome into two separate windows and clicking refresh every second.....

And now add in the other 10 copies of the program that people downloaded.

Let's assume:

1) They all got it to work (doubtful, because shortly before the site went down, I posted links to an updated version of the software for Windows 7, because it wasn't working for Windows 7 users)

2) They were on their computers from the moment they got the software until the moment of the crash, constantly feeding in new usernames to download posts/comments from, without any breaks whatsoever.

3) They also managed 1.6 requests per second.

And we now have a combined total of 17.6 requests per second. Very, very, worst case scenario. What was it that Jon said in his email to Michael?

"Multiply this by a few people and there are hundreds of heavy requests a second, causing all issues being logged on the server."

If hundreds of people had downloaded my software, then yes. This could indeed be the case. But only 10 did......

Finally, if we assume that each error message is around 300 bytes long, then at the calculated rate, 11 users including myself going continuously, could rack up around 5KB per second of error messages (or 18MB per hour, or 432MB per day, or around 0.9GB of error message between the time my program was released, and the site went down, assuming that every single http request was logged as an error. I don't see how that could happen, unless my program wasn't actually working, which it was, and I have 40 users posts and comments to prove it. Total size of all of these posts and comments? 30MB...

Second Issue:

Jon 'quoted' me in his email. This is what he sent to Michael:

'They even stated "you're going to have to wait, this slows things down..."'
This is completely untrue. I have a copy of my original post (thanks to my program lol). This is what I said.

"Press the 'Gather Posts and Comments' button. Patience is required here. The program now sifts through all of the users posts and comments pages, and then downloads and extracts the information from each one."

Now, I'll give Jon the benefit of the doubt and assume he just is stressed, fixing a server on Xmas eve, and didn't get what I meant. 

I didn't mean that the site would slow down. I meant that it takes time to download all of the relevant pages from PL, actually because instead of requesting every single page at once, my program grabs them 6 at a time, and then has to wait for PL's delay in reply.

However:

One thing that is possible is that, when I was working on my program, I may have been generating many error messages during this process. Also, after I finished my first program, I was rushing out a program that could download entire threads (not by user, but by downloading the post and all conversation in the comments below, and stitch it all into one monolithic HTML page).

It is possible that this is the case; that my testing of my software has caused the problem. But again, I can't see any conceivable way, whether via ten users plus me, or in testing, that my software generated 'hundreds of requests per second'. 

So in order for this to be true, 1) the available hard drive space must have been already very low, and 2) the software (Drupal, in the case of PL) must have not sent a warning email to the responsible parties to warn of impending problems (hard drive space nearly full, lots of errors).

My theory:

I assume that I am correct on the above, and haven't overlooked something. (Possible, but I don't think so)

I assume that there is no funny business going on with Michael/Jon. I want to make this clear. I don't think they're pulling the plug early for some unknown reason.

I assume that, given the site is to be shut down within 1 more week, that no further hard drive space was to be supplied to the site, and that Michael may have requested the site be 'drawn down' slowly, to save costs, or whatever, making it possible for my program (and perhaps the influx of members we haven't seen in quite some time (some names I've never seen) - after the shutdown announcement) to push it over the brink.

Again, if my software or testing of the software caused the site to go down, and I am solely responsible for downing a well-equipped (sufficient hard drive space) and much-loved website prematurely at the cusp of its imminent shutdown, I am sincerely sorry to Michael, and Jon, and the entire DP/PL community.

What is the category of this post? (choose up to 2): 
enemyofthestate1776's picture
About the author
"I know not what course others may take; but as for me, give me liberty or give me death!" - Patrick Henry
LibertyUserName's picture

There is no problem and that is the problem I am having.

What is the problem here?

For your sake I'm agreeing and reiterating I don't think it is your fault. It did not seem like anyone else was/is blaming you. If you feel bad and or at fault; don't because no one seems to be blaming you but yourself and that was the intent of my original comment as well? Was Jon blaming you or pointing out the problem? Seemed to me he was pointing out the problem. No blame, no worries, no problems, no fault. It is not your fault. No one seems to think that. If you think someone does, based on an outside perspective reading that note on PL's page now it doesn't look like anyone is blaming you so you shouldn't feel bad about anything.

ecard71's picture

his view as well. It's only fair if you ask me. There was an assumption made. While neither you, nor myself may blame Enemy, I am very sure MANY members would simply take Michael's message as the gospel truth that "Hundreds" of users used the program, in effect, causing the preemptive shutdown. Where does that lead?

Just my take on it.

I STILL STAND WITH RAND!

deacon's picture

" Multiply this by a few people and there are hundreds of heavy requests a second, causing all issues being logged on the server.  To me,it reads a few people,with hundreds of requests a second...by a few who downloaded that program

 d

.

ATruepatriot's picture

I was curious about that too Deacon. And it would also count all even those reading and not logged in right? Ad service API's, Bots, Spiders, Crawlers and Etc.? I'm sure the site has been crazy busy the last couple weeks and drawing a lot of just general reading traffic hits. Even though there might be a personality question in this there truly may have been a huge amount of requests already happening before any additional started happening. :)

Note: I replied to Deacon but it shows "reply to #24" instead :)

"Jack of all Trades...Master of None" But forever learning more!

ecard71's picture
"Note: I replied to Deacon but it shows "reply to #24" instead :)"
 
No worries Patriot, the body of your post was clear.
 
I'm going to post a link of this over in the suggestion box.

I STILL STAND WITH RAND!

ecard71's picture

I strictly went by this on the OP: "If hundreds of people had downloaded my software, then yes. This could indeed be the case."

Appreciate you pointing that out for me.

I STILL STAND WITH RAND!

enemyofthestate1776's picture

I think I just reacted to your assumption that the program was to blame. I see your point clearly now. Anyway no need to bicker. Glad we both stayed respectful :)

"I know not what course others may take; but as for me, give me liberty or give me death!" - Patrick Henry

LibertyUserName's picture

directly responsible. You started your comment by saying you don't think you should be responsible and then ended it by saying you respect Jon and Michael for not naming you directly.

So what is the problem? Your program was too big for what they had set up they were clearly scaling down.

I don't think anyone is at fault and I don't think anyone was pointing fingers, you admitted no one was pointing fingers, but then said you didn't want to be directly responsible when no one was holding you directly responsible. 

enemyofthestate1776's picture

That's it. And my wish to actually take responsibility if it was indeed my program that did the damage. And my wish to not feel OR be held responsible (at any point, now or in future) if it wasn't actually my fault.

So I'm here just saying my view. Why is that an issue? All I did was front up here, explain my view, and what I think may have happened.

My post shouldn't be taken to mean I'm trying to say bad things about Michael or Jon. Simply that I think they may be wrong. But I also admit I could be wrong. What's the problem with that?

"I know not what course others may take; but as for me, give me liberty or give me death!" - Patrick Henry

GCN3030's picture

Perhaps the real reason the site had to go down was because your program was going to archive everything which might defeat the purpose of destroying the community by sending all the info on the DP/PL down the memory hole.  Maybe I am just cynical but from my perspective to not consider the possibility of dirty tricks seems naive at this stage of the game.

<p>...</p>

ecard71's picture

"Maybe I am just cynical but from my perspective to not consider the possibility of dirty tricks seems naive at this stage of the game."

Agreed. Everything is possible. Question everything.

I posted a reply of other possibilities. Or it can be just as simple as Michael's message stated, "stuff just happens". Who knows?

I STILL STAND WITH RAND!

Bitcoin SEO Technologies's picture

We will never know either mystery. 

The logs are apparently unavailable to Jon. He probably doesn't care anyway, because nobody's paying him anymore. Neither he nor Michael have a motive to dig any deeper on this. It will be remembered as MN's excuse to pull the plug in time for eggnog.

The whole thing will remain a mystery until the end of time, like Kennedy's magic bullet. However, one solid thing I learned: Don't trust Nystrom with 9 years of your best material and community building. He has a way of torching it.

enemyofthestate1776 demonstrated his ability to stand in the gap and fill a great need in time before the great cataclysm. Guess which guy I'll call on when I have a problem to solve.

Case closed.

BitcoinSEOTechnologies provides professional SEO services for growing the bitcoin business community.

enemyofthestate1776's picture

I never say never, though.
Some weird stuff's gone down over the years on liberty forums. Either way, it's what we do NOW, in response to the setback, that will determine our future.

"I know not what course others may take; but as for me, give me liberty or give me death!" - Patrick Henry

LibertyUserName's picture

the original error message lines up with this being the problem. Is he blaming you or simply pointing out the problem? 

john2k's picture

I think more specifically, it's a server hard drive or database partition space issue.  The letter on PL mentioned "and it filled up the disk with error messages."  That's the part which does line up with the hard drive space issue which seems to be a definite.

However, I think it's also possible that their hard drive or database partition was already running low on space.  The error messages may have just pushed it past the tipping point.

Even if the database had only 1 GB of space left, that would take an enormous number of error messages to fill up.  And I doubt DP was running the database with only 1 GB of space left, especially with hard drive space being so relatively cheap these days.  Probably had much more space than 1 GB left.

Anyhow, it's not a certainty, in my opinion, but is possible that it was the sole cause of the issue.

@dpc_network  DP Community Network on Twitter
View the DP/PL member directory and connect with others.

enemyofthestate1776's picture

And I agree it is possible. But given that I know how the program works, it seems to me something very strange must have been going on in order to create an enormous supply of error messages.

I even went to the extent of checking what the app was doing in Wireshark, so I know what it was, and wasn't, sending and receiving. 

I'd be more certain of the app being not to blame, if it wasn't for the possibility of some strange HTTP header issue that may have made the server throw errors. Which would indeed be my fault.

"I know not what course others may take; but as for me, give me liberty or give me death!" - Patrick Henry

enemyofthestate1776's picture

1. The fact that error messages filled the hard drive and froze the site doesn't mean I am responsible.

There's a number of issues there, including

a) shouldn't there be sufficient hard drive space to handle this kind of situation,

b) shouldn't there be a system to handle this kind of situation? Like if the error messages start to multiply, some kind of email gets sent to someone (I admit that Jon and Michael are probably away right now for the Xmas break, though, which I think is a fair comment)

I mean, really, if all it takes to down a website is to request a bunch of incorrect URLs, therefore filling the drive with errors, then why don't the DDOS people do it? I had 10 downloads, friend. So me and 10 others can now take a site down, just like that? We could rule the world with 1000 of us, man!

Secondly, john2k also said

"On another note, I wonder if the server hard drive or database hard drive partition was already close to being full with maybe enough space to get the site through Dec 31st... but this issue pushed it to the limit sooner than expected. - See more at: http://www.acalltopaul.com/comment/677#comment-677"

Which was basically what I said in my conclusion.

And finally,

I actually had great respect for the fact that neither Michael or Jon named me directly. I felt I should directly address the issue from my view. It doesn't make sense to me, but as I pointed out above, I will happily admit my mistake should I be proven wrong.

"I know not what course others may take; but as for me, give me liberty or give me death!" - Patrick Henry

crabacado's picture

Don't take it too heavy if it was indirectly your fault, personally, I think the NSA had something to do with it.

 

Sounds like a lot of people used your program, congrat's on that!

 

and welcome to here

A man who chops his own wood is warmed by it twice

enemyofthestate1776's picture

I had 10 total downloads, outside of my own downloads onto test platforms....

Thanks for your support, though, if it was my fault I will be red-faced and very apologetic.

"I know not what course others may take; but as for me, give me liberty or give me death!" - Patrick Henry

Libera_me's picture

and got a 404 message. It was cute, but I would appreciate a working link.  :)

Speak up for those who cannot speak up for themselves, for the rights of all who are destitute. Speak up and judge fairly, defend the rights of the poor and the needy.~~ Prov.30: 8 & 9

Pages