Arc Forumnew | comments | leaders | submitlogin
3 points by i4cu 2348 days ago | link | parent

By PR I assume you mean Project? Any interest in providing details?

P.S. I read your post and had a flashback. Eight or nine years ago I was working at a mega corp and couldn't get the company's shared resource team to do any work. At the time I was a BA and had been learning arc in my spare time (as a first language). Long story short I ended up coding an ETL via arc to load annual budget data (template data) from spreadsheets.

The good news was that it worked (no bugs), the bad news was that it was a hack. I lacked two core libraries; One was a spreadsheet-parsing-api library and the other was an oracle DB library. If I remember correctly I created a defop service where people could copy/paste data from the spreadsheet to an input box. Upon submitting the form the data would go through a validator process and return with feedback indicating errors in the data. When successful it would embed the data into a generated SQL string that would be run from my machine via a command line call. All this coded in like 2 days or so.

LOL. Companies have processes, but when things can't get done by the standard process watch every manager in the company thank you for saving the day. Sadly I think I got a bigger bonus than anyone on the dev team even though they probably would have done it right. I guess life rewards results over all else.



3 points by krapp 2346 days ago | link

Yes, sorry, I meant pull request.

Although I do consider it a personal project to fix the forum as much as I can and eventually spin off a lightweight, anonymous fork.

The most ambitious coding I've done was writing a custom CSV parser to add parts catalog data to the web backend of a lawnmower parts store, but that was in PHP. This is my first practical experience using a lisp, though.

-----

3 points by i4cu 2346 days ago | link

Well if you're going to make fixes to the forum fix this one too:

http://arclanguage.org/item?id=20272

One thing to consider is that the code is really, really old and was built before cloudflare became a thing. Even the auth code is now fairly outdated (no captcha, no CSRF, etc. etc).

Don't get me wrong it's still useable, but depending on why you're doing this it may not be worth it. If it's to learn arc then swing away, but if I was looking to set up a forum today I's sooner look into installing the open source version of discourse.

edit: also check out shaders post on "Next Steps" [1] and more specifically [2]. Some of these items might overlap with your plans and if so maybe you can put a call out to others already doing the same thing to share in the workload.

1. http://arclanguage.org/item?id=20319 2. http://arclanguage.org/item?id=20324

-----

3 points by krapp 2345 days ago | link

CSRF and captcha are definitely on the list, since i'm working on the forms anyway.

I do like reverse engineering the new features from HN in principle, although it's a shame that it's even necessary.

-----

3 points by krapp 2336 days ago | link

...speaking of which, I just now managed to correctly send and verify a recaptcha request.

Just have to figure out how to parse the JSON response and then integrate it into the forum and recaptcha should work.

I really should have checked that there was already an http library before trying to get it it work with Racket's, though.

-----

2 points by akkartik 2335 days ago | link

What's the thinking behind implementing captcha? I really, really hate ReCaptcha. For most sites the payoff to a spammer isn't really worth even the simplest robot test. A new site that throws ReCaptcha is an instant bounce.

-----

3 points by krapp 2335 days ago | link

>What's the thinking behind implementing captcha? I really, really hate ReCaptcha.

It might be useful if it only shows up on the login/signup page after failed attempts, but it might also be overkill. Personally, I prefer overt solutions to opaque ones like shadowbanning and throttling IPs. But mostly it seemed like a good way to figure out some basics (managing keys, how the app server manages the form loop, API responses and JSON.)

I could finish the test I have, push that and leave integration for later.

-----

2 points by akkartik 2335 days ago | link

I agree with overt vs shadowbanning. My tendency is to just add tools for moderation. Let bots sign up and spam, but make it easy to take them down and ban them at that point. Arc is reasonably decent there.

-----

2 points by i4cu 2335 days ago | link

If the content is publicly accessible and free then that works, but otherwise it doesn't help.

-----

2 points by akkartik 2335 days ago | link

I don't follow the connection.

-----

3 points by i4cu 2335 days ago | link

Sorry, my comment was rather lazy. Let me try again.

I get the impression many of the people who want to start a HN like clone are considering using it as a content delivery platform. For example http://arclanguage.org/item?id=20452 notes a revenue sharing model for specialized content. Now, if that's the case these site owners will have both spammers and content scrapers to contend with. So my initial comment was also referring content scrapers too.

However there's another thing to consider: Session storage costs. About 6 months ago I went through a process to reduce the cost of data held in memory for session storage (redis in this case). The session data was continually analyzed for determining both who the bad users were and for knowing what value the good users are getting out of the app (feature planning etc). It was an interesting process, where just by reshaping the session data, thousands of dollars per month could be saved in db fees. Now I realize the someone starting a HN clone is probably not dealing with that, but I'd be willing to bet that part of the reason Captcha was implemented in HN proper was to reduce fees associated with the volume of requests, session costs and even network load. It's my feeling that, generically speaking, adding captcha functionality is a good Option to have.

-----

2 points by i4cu 2335 days ago | link

> I'd be willing to bet that part of the reason Captcha was implemented in HN proper was to reduce fees associated with the volume of requests, session costs and even network load.

Let me add some info to that.... I'm not sure if anyone noticed, but HN has implemented new session management strategies. You can see this as your login is now maintained across multiple devices, where the arc code (that we have access to) logs you out upon logging in elsewhere. I also believe that when pg handed over the HN code significant changes occurred including how session data is stored and how that data is utilized to integrate with cloudflare. Obviously I'm making big guesses, because I don't have access to the code, but I'm willing to bet the changes HN has put in place would surprise everyone here.

Sadly everyone who sees HN today will come here and look for the source code not realizing what's available here is not modern nor comparable.

-----

3 points by krapp 2335 days ago | link

>Sadly everyone who sees HN today will come here and look for the source code not realizing what's available here is not modern nor comparable.

One thing I noticed when Arc gets brought up on HN is that everyone seems interested in the language but not so much the application. People seem to cargo-cult the forum for some reason.

-----

2 points by akkartik 2335 days ago | link

Fascinating, thanks for that war story and perspective. That makes a lot of sense.

-----

2 points by i4cu 2335 days ago | link

> For most sites the payoff to a spammer isn't really worth even the simplest robot test.

Maybe I am missing something.... Isn't captcha just a fairly simple robot test (and thus preventing spam)? Or are you suggesting something even simpler? Because I've run a few sites and had tried implementing very simple programmatic obstacles and it really didn't stop the spammers.

Maybe the better question is - what would you suggest?

-----

3 points by akkartik 2335 days ago | link

Simple captchas are totally fine. My problem is with Google's ReCaptcha in particular, where the problems have gotten so difficult that I mostly can't prove I'm a human.

-----

2 points by i4cu 2335 days ago | link

Hmm... That's not my experience. 90% of ReCaptcha tests are invisible if not a simple checkbox. Only a small optional subset requires introducing a problem solver.

https://developers.google.com/recaptcha/docs/versions

I do agree though, the text ones can be a pain, but that doesn't happen too often. Sadly HN seems to push those more often.

-----

1 point by akkartik 2335 days ago | link

I don't get either a checkbox or text anymore. I get to identify pictures with cars and signs and whatnot.

-----

2 points by i4cu 2335 days ago | link

I've never failed a cars or signs test. It's only the text scribbles that kill me :)

I have to wonder what you're doing that make google zero in on you.... lol. Tor? maybe proxy IP's?

-----

3 points by rocketnia 2330 days ago | link

One time I spent a good 20 minutes identifying cars, signs, and storefronts before it would let me in, and that was with no VPN or Tor or anything. At some point they oughta be paying us. :-p

-----

3 points by hjek 2325 days ago | link

Someone tried to take Google to court already, arguing exactly that :-)

https://arstechnica.com/tech-policy/2016/02/judge-tosses-pro...

-----

3 points by rocketnia 2325 days ago | link

"Plaintiff has failed to allege how these numerous benefits outweigh the few seconds it takes to transcribe one word."

A few seconds is qualitatively different from 20 minutes, I'd think. :-p

-----

1 point by i4cu 2329 days ago | link

This is probably going to sound super crazy, but I have to say it...

I know you (akkartik) have a google account, because I remember when you moved your blog over to google's services (I think they call it 'circles' or some such). I also remember you created a news aggregator application that scraped content. Yes, I know, it was a long time ago in a galaxy far, far away..., but still...

I'm thinking that google identified your scraping work and deemed you a risky robot type, but they also probably correlated your IP from the scraping to your IP from your google services login and tagged you that way. So now, even if your IP changed, they'll continue to have you in their cross-hairs for, like, ever.

Any takers? If you'd like I can also look into who killed JFK...

-----

3 points by akkartik 2329 days ago | link

Lol, no.

Other possible reasons:

a) My cookie acceptance policies are non-standard. (I no longer even remember what they are anymore.)

b) I'm often behind a VPN for work.

c) I'm often on my phone, or tethering from my phone.

Complaints about ReCaptcha are fairly common if you look on HN and so on. You don't have to have run a scraper to hit it, I don't think. I think you may be a robot from the future for never having problems with the pictures of signs and cars :p

Final minor correction: I've played with Google+ in the past (I actually worked at Google on Circles for a year) but I never moved my blog there. I just linked to my blog posts from there.

-----

2 points by i4cu 2329 days ago | link

> Complaints about ReCaptcha are fairly common if you look on HN and so on.

Yeah I'm aware of the complaints, but in my mind HN wouldn't be the best resource of information for such an assessment. By default HN members are non-standard in most ways that would matter to ReCaptcha.

It's an interesting dilemma and one that I'm coming up on soon as I plan to release a new app in a few months time. In my case the intended audience for the app is very widespread and not specific to a tech audience. It could be that the vast majority of my users (if I get any - lol) would never have a problem, because the vast majority of people using the net don't know what a VPN is or how to change a cookie setting (just as examples).

I'll have to give it some more thought, but in the mean time, are you aware of any resources on the matter that would be more reflective than HN?

edit: I often find info like this [1]:

  "Different studies conducted by Stanford University, Webnographer and 
  Animoto, showed that there is an approximately 15% abandonment rate when the 
  users are faced with CAPTCHA challenge."
1. https://www.infosecurity-magazine.com/opinions/captcha-fraud...

But really I do expect to take some loss when using reCaptcha. The question really becomes is it worth it? After all spam can also cause users to leave and content scrapers can also de-value your product.

-----

3 points by akkartik 2329 days ago | link

Certainly, it's an issue only tech-savvy people will have.

However, every step you put users through is a place where your funnel will leak. So in your place I wouldn't turn on captcha until I actually see spam being a problem.

Also, independently, since I am personally ill-disposed towards ReCaptcha I have zero urge to help maintain it in Anarki :) You're welcome to hack on it, though!

-----

2 points by i4cu 2329 days ago | link

> So in your place I wouldn't turn on captcha until I actually see spam being a problem

agreed.

> I have zero urge to help maintain it in Anarki

It's really only a few line of code (probably smaller than a unit test) and it has already exposed json bugs, so I consider it a win all around.

At any rate it's probably verging on discussion overkill for such a small item. :)

-----

2 points by krapp 2329 days ago | link

I think it's less important to have Recaptcha or not than it is to have a working POC for interaction with a remote JSON API, and for parsing JSON in general, since that opens up a lot of possibilities. Recaptcha itself is just the low-hanging fruit for that, since it's so simple.

As far as integration goes, we could just leave it up to whomever wants to do the work or make it easily configurable with the default being not to use it at all.

-----

3 points by krapp 2327 days ago | link

... well, it's up[0].

I don't know why the tests keep failing, though, it works locally.

[0]https://github.com/arclanguage/anarki/pull/102

-----

3 points by rocketnia 2327 days ago | link

It's great to see a JSON API integrated in Arc. :)

I took a look and found fixes for the unit tests. Before I got into that debugging though, I noticed some problems with the JSON library that I'm not sure what to do with. It turns out those are unrelated to the test failures.

I left details about these in comments on the closed pull request, which might not have been the best place: https://github.com/arclanguage/anarki/pull/102

-----

2 points by krapp 2327 days ago | link

The JSON solution is a quick and dirty hack by a rank noob, and I'm sure something better will come along.

And in hindsight the problem with the (body) macro should probably have been obvious, considering HTML tables are built using (tab) and not (table). I'm starting to think everything other than (tag) should be done away with to avoid the issue in principle, but that would be a major undertaking and probably mostly just bikeshedding.

-----

3 points by hjek 2326 days ago | link

CAPTCHA sounds interesting.

Would be seriously cool if the CAPTCHA was implemented in Arc and generated the images locally instead of relying on some Google SASS.

(I twiddled a bit with Racket's image manipulation functions in img.arc in the Anarki repo, if that's any help.)

-----

3 points by akkartik 2347 days ago | link

Can you link to the post you are referring to?

PR = Pull Request.

https://github.com/arclanguage/anarki/pull/99 https://github.com/arclanguage/anarki/pull/100

-----

3 points by i4cu 2347 days ago | link

Ahh, well my whole comment is completely off topic; sorry. That's what 10 years of being a BA does to the mind :)

> Can you link to the post you are referring to?

I guess I should be using the expected terminology, that being a 'comment' not a 'post'. In my mind he 'post'ed a comment, but I understand I should have been more clear.

-----

2 points by akkartik 2347 days ago | link

Oh I think I understand now, thanks :)

-----