[erlang-questions] Two beautiful programs - or web programming made easy

Mon Feb 14 13:43:57 CET 2011

On 2011-02-14, at 03:35 AM, Joe Armstrong wrote:
> 
> Ok so "separation of concerns" is good but having different notations for expressing the concerns
> is crazy- to make a web thing that interacts with a server you need to learn something like
> 
>         HTML
>         Javascript
>         CSS
>         PHP
>         MySQL
> 
> And to be able to configure Apache and MySQL - other combinations are possible.

I can agree with that. To have a functional website, you do need to know a lot of different technologies. The web evolved organically and each part of the problem space had its own solution developed over time. 

> 
> Then you have to split the flow of control to many places.
> 
> All of this is crazy madness. There should be *one* notation that is powerful enough to express all
> these things. In the browser is seems sensible to forget about css and html only use Javascript
> The only communication with the browser should be by sending it javascript.

There should, but there isn't. The truth here is that most programmers are awful at design. In any somewhat large setup, your backend programmers, your designers and your integrators (the guys just handling HTML, and CSS, maybe some Javascript) are not necessarily the same person. 

Right now the ring of web technologies is divided in a way that makes it somewhat simple to have different people from different background and knowledges to work on different part of your software. It makes sense to have the designer or integrator to be able to change the look and feel of a website without having to play in your code and maybe mess up database queries. Modern template engines in fact try to forbid all kinds of seriously side-effecting code (like DB queries) from happening in the templates.

There should be no worry for your guy working in Javascript that he'd not need to suddenly learn Erlang to be able to debug your application.

Then again, this separation of concerns allows specialists to work on their speciality with more ease. It makes things somewhat simpler in larger organisations, but quite painful for one-man operations. I'll tell you that it makes a lot of sense when you know all of the tools in the toolkit though :)

> How you generate the javascript is irrelevant - by hand or by program - who cares. If you make it by
> program the chances are that it's right.
> 
Yes and no. Generated javascript is nearly as old as the language -- many, many .NET apps had that kind of things. Some editors like Dreamweaver could generate JS for you. One of the problem with this is that it was often pure garbage, or it wouldn't work in all browsers uniformly. If you can manage to generate and capture complex behaviours in a compliant manner, all the better. I have myself lost much hope with regards to that though.

> Security is orthogonal to this - send encrypted js over the wire and make sure your key-rings are secure
> this is a completely different problem.

This is only transmission security. Encryption has nothing to do with Cross-Site Scripting (XSS, where some user is able to run arbitrary JS in your page for you and ends up stealing information), Cross Site Request Forgery (CSRF, where the attacker uses the fact your application is forgetting about things like the origin of the queries to hijack the client's session in their place. This is related to Same Origin Policy issues and not easy to handle), SQL injection, overwriting some parameters because you don't fetch them in the right order server-side (see problems with the $_REQUEST variable in PHP), etc.

1. XSS
XSS is, as mentioned above, the ability to run abritrary JS on a page. This is the risky thing with your eval. http://en.wikipedia.org/wiki/Cross-site_scripting contains many details on understanding the related issues. It's not always a simple matter of escaping. Some more advanced attacks even rely on string encoding to make sure your escaping fails. See http://www.governmentsecurity.org/forum/index.php?showtopic=18105.

2. CSRF
CSRF is a tricky thing. Because HTTP doesn't support sessions, over the years, the guys from Netscape (back then) or Opera (or whoever) ended up using Cookies to share data on every query. What happens there is that on every query the browser sends to a server, it also packages the cookies neatly in the headers -- no matter what page you were on when they were sent. The issue here is that the server might not check from what page the call is coming from.  

Basically, if twitter had an URL call such as http://twitter.com/tweet/add?message=SomeMessageHere that would automatically add a tweet from your account and I put that link in an image tag on some site, every time you would load that image, you would automatically make a call to the server, your browser sending in your cookies and making it look like YOU actually made that call, even if you didn't know. In this case, the request is especially easy to do because twitter would be using GET parameters to have side-effects on the server. By forcing people into using POST, you can make things harder, but not impossible. 

One way to work again POST is using a fake website -- let's say I use learnyousomeerlang.com. On my own site, I'll be putting a fake javascript form inside an iframe (so that the page doesn't refresh when submitted) and have the script automatically send in the POST form. Now I send the link to my trick page over twitter and everyone who clicks on it from there will be guaranteed to have their session open and sending in data. I've in fact used this trick to have the site owner at my old job to close his own admin account on his own website so he could realise the importance of the threat.

How can you solve this one? Well there are a few ways -- for one you could check the HTTP referrer, but that won't work everywhere -- if you expect calls from flash, it doesn't always send these elements of the HTTP header. In the case of HTTPS, depending on how you handle things, the header might not always be sent either so you can't know for sure. Better than that, if I'm using the <img> trick on your own website (on twitter, for twitter users), the domain will be the same, without you being able to check for anything.

The only foolproof way to do this is to use what they call 'tokens': each call you make to the server has to have a unique piece of data that the server knows about that can prove that the call you just made comes from you, but also from your own forms on your own websites. These tokens should have an expiration time and be hidden from plain view, submitted automatically with any form. If you don't have this, your application might not be safe.

This has *nothing* to do with encryption, and everything to do with not understanding the potential threats of the web correctly. It is an application-level issue, much like XSS is. And it's pretty damn important.

3. SQL injection is a different beast, where you do not properly escape the parameters of a request going to the database, letting your run arbitrary DB calls. Erlang with Mnesia doesn't have to worry about that, but Erlang with any SQL has to, even if you end up using QLC (it depends on the library at the back in this case though, and is generally safe enough). http://en.wikipedia.org/wiki/Sql_injection has sufficient details.

4.You have to consider that sometimes these attacks are combined together to be able to really do damage.

I haven't even covered using weak hashing for passwords, bad security policies on cookies, opening files on dynamic paths without filtering the input, etc.

Web application security is not a joke and it's certainly not easy. It's a very serious thing and most developers get it wrong at one point or another. Wordpress got it wrong, Twitter got it wrong, facebook got it wrong, Google got it wrong, and so on, even though they're supposed to be leaders in the field. Most of them got it wrong more than once too. This is why I kind of support a 'paranoid' line of thought.

--
Fred Hébert
http://www.erlang-solutions.com