[erlang-questions] Two beautiful programs - or web programming made easy

Thu Feb 17 05:26:25 CET 2011

On Thu, 17 Feb 2011 02:34:00 +1100, Joe Armstrong <erlang@REDACTED> wrote:

> 2011/2/16 Edmond Begumisa <ebegumisa@REDACTED>
>
>> I'm a glass-half-full kinda guy :)
>>
>> I'd like to think this is the kind of community where someone can say ..
>>
>> "Hey, I've got this whacky idea. It's early stages but I think I'm onto
>> something."
>>
>> Then in *addition* to the community saying "We'll, there are problems  
>> x, y
>> and z you may have overlooked", the community *also* says "Possibly p,  
>> q,
>> and r might help with these."
>>
>> I think if you look at what Joe's doing, it can be expanded on to get
>> something very useful. Let's take some of everyone's concerns and  
>> actually
>> *try* give Joe some advice on how they might be addressed.
>>
>> I've started inline, hopefully others can add some input...
>>
>>
>>
>> On Tue, 15 Feb 2011 07:37:44 +1100, Frédéric Trottier-Hébert <
>> fred.hebert@REDACTED> wrote:
>>
>>
>>>
>>> On 2011-02-14, at 15:17 PM, Edmond Begumisa wrote:
>>>
>>>  You've outlined a nice list of the top security concerns of *every*
>>>> web-developer generating dynamic content from the client-side  
>>>> (probalby a
>>>> good chunk websites uploaded since 2005.)
>>>>
>>>> What I still don't get is why you find the generation of dynamic  
>>>> content
>>>> from static js files acceptable while using eval for the same you find
>>>> unacceptable. I don't see how XSS, CSRF, SQL Injection, and all the  
>>>> things
>>>> you list are more unmanageable from static js that generates content  
>>>> vs
>>>> streamed js that generates content.
>>>>
>>>>
>>> The point of my security list wasn't that much about eval itself rather
>>> than contradicting the precise point that 'all you need to do is  
>>> encrypt
>>> javascript'. That's a reductionist and erroneous view.
>>>
>>> Generatic dynamic content from static JS files has a few advantages:
>>> caching on the browser side, distributing the code via CDNs rather than
>>> through your app server,
>>>
>>
>> How do AJAX sites solve this? One way is to break the "one page app"  
>> into a
>> few pages. Maybe he could introduce a window concept that maps to a  
>> page...
>>
>> Pid ! {new, window(...)}
>> .. work .. work...
>> Pid ! {new, window(...)}
>>
>> There's a start.
>>
>>
>>  benefiting from JIT if available (rather than calling the
>>> compiler/interpreter each time you update content), potential static
>>> analysis of code (even through things like JS-Lint). Also a smaller  
>>> payload
>>> on the network and bandwidth -- if you only send in the functions to  
>>> run the
>>> code once rather than on every call, you'll save a lot.
>>>
>>
>> I suggested before sending parts of your app in ordinary static js  
>> files,
>> then call the functions as libraries. The benefits above will start to  
>> be
>> felt.
>>
>> Hmmm.. I wonder what Nitrogen does, they might have some tricks.
>>
>>
>>  In most cases, rendering the page (CSS included), running the JS and
>>> transferring the data counts for 90% of the time a user will wait when
>>> querying a page. Streaming JS to then evaluate it is going to be  
>>> terrible
>>> for performances on larger scale applications.
>>>
>>
>> I dunno about this. Might be a bit of a blanket statement. Wasn't the  
>> very
>> reason AJAX came around to INCREASE performance of larger scale  
>> applications
>> *precisely* by streaming markup and JS for evaluation and rendering
>> on-demand rather than all-at-once because in the latter case you  
>> normally
>> send more than is actually needed?
>>
>>
>>   There's probably more to add to the list, but that's what I can think  
>> of
>>> in 15-30 seconds.
>>>
>>
>> Likewise, I could probably add more but these are the ideas for  
>> improving
>> Joe's concept that I can come up with in 15-30 seconds ;) Others can  
>> pitch
>> in.
>>
>>
>>
>>> And things like what I mentioned are not more or less unmanageable from
>>> static files (I agree with you there), except for XSS:
>>>
>>> XSS is better treated in many cases by things like JS frameworks. If  
>>> I'm
>>> getting the result from some web query into JS, I will receive a neat
>>> string, without a chance of it being wrong. If I then push this string
>>> through my framework (say JQuery), it'll take care of doing specific
>>> escaping of things like element attributes, element content, etc.
>>>
>>
>> What stopping you?
>>
>> As I illustrated in the previous mail on security, you can call JQuery  
>> from
>> code that's being run in eval too! Joe's calling everything from JQuery  
>> to
>> SVG libraries!
>>
>>
>>  If I do it dynamically through my applications, chances are much better
>>> I'll get the escaping wrong in Erlang (and you need to escape on more
>>> levels) than JS, where it can be made on a per-element basis when  
>>> building
>>> the DOM: "create a tag, add the attribute, add another one, add the  
>>> tag's
>>> content, push it" vs. "mix and matches all these strings into hopefully
>>> valid JS". This is even truer when you consider hacks such as Google's  
>>> UTF-7
>>> encounter back in the day. This follows the idea that JS knows JS  
>>> better.
>>>
>>>
>> Right, let's convert that statement from a critisism into a really good
>> piece of advice:
>>
>> Joe: That code where you're manipulating the DOM, where you do
>> ".insertElement" and such, know what? Better do that via JQuery instead.
>
>
> Why? - the only reason I can think of is cross-browser compatibility.

Apparently, JQuery is super smart how it handles this in relation to  
possible XSS attacks ... or so I hear. Personally, I don't really see the  
difference. Frédéric seems more informed here and would be in a better  
position to explain.

> At the moment I don't really
> like libraries since I want to see whats going on as near to the bottom
> level as I can get - libraries
> obscure what's going on. My goal is understanding, and minimal lines of  
> code
> (to aid understanding)
>

I too would only use a client side library for things that I cannot do  
myself (like render SVG on non-SVG supported browsers) or things that are  
too tedious to do manually (like handling cross-browser quirks). But I'm  
informed on this thread that there's more to these tools like escaping  
JS/Tags/attributes, which I'm told is really hard to do correctly from  
Erlang. Again, I'm not sure about this myself, and planned on  
investigating further.

> Right now I have several competing ideas - I could do with some informed
> advice here.
>
> I'll fire off some questions:
>
> For Graphics:
>
> 1) SVG or
> 2) HTML5 canvas
>
> Canvas is faster but has no support for objects, making onclick and  
> ondrag,
> onmouseenter in a canvas
> is a pain and probably either eats CPU or memory. Any good libraries -  
> I've
> looked at all the well know
> ones. I only want object support for a canvas (ie object grouping and  
> adding
> click, move events to
> object groups) - not lots of other stuff. This is why I'm currently using
> SVG.
>

Hmmm... Question: What is your higher priority? Having a canvas to deliver  
UI to your users or making the browser easier to program UI to?

If you are not so bothered whether delivery is to an actual browser so  
long as it's done over the web using web-standards then I have a  
suggestion...

SVG support in the Mozilla Framework is pretty OK (save a few features);  
the Gecko layout engine handles SVG natively so it renders pretty quickly  
since no JS wrapper libraries are required. What if I whipped up a little  
XULRunner client that you could push your UI to?

Your users could download and install this generic client. At startup, it  
could popup a dialog asking them which app (i.e. server URI) they want to  
connect to, then open a new blank window that you could push all your SVG.  
One could also push HTML, XUL, etc.

Sure, it's not a browser that everyone already has, so your end-users will  
have bothered to install it. But they'll have to do this only once since  
the client is generic. Besides, what you're doing doesn't really fit  
precisely into the browser idiom anyway, so maybe a special client is  
appropriate. And this XULRunner-based generic client could be compiled for  
and installed on every platform that Firefox supports.

> For the keyboard:
>
> How can I get keystrokes into my program from javascript - virtually  
> every I
> try is buggy -
> Do I really have to sniff the browser type and fix the bugs of every  
> single
> browser ..
>

AFIAK, yes. Keystrokes are one of those cross-browser quirks library  
writers don't focus on. So you have to wade through the mess yourself.  
But..

Another advantage of a generic XULRunner-based client: You won't have to  
deal with cross-browser quirks since you'll only be targeting 1 layout  
engine, 1 js engine, 1 DOM implementation, 1 etc, 1 etc.

> Rich text. I want to do pixel exact typography
>
> I can make rich text by adding spans and css and stuff in the dom, but I
> want pixel accurate
> sizeing of spans - I want to do the following:
>
> define several on screen divs with absolute size and position. Link them  
> in some order. for example say
>
> <div id="a" style="absolute:...." next="b">
> <div id="b" ...                            next="c">
>
> Then given rich text <p><span class="c1">...</span>  I want to flow this
> into div a, so that it overflows into div b - I need to pixel exactly
> calculate where to spit the text in order to do this.
>

I don't think you can control the size of spans in that way. Spans are  
non-replaced inline elements, so they have no width or height properties...

http://www.w3.org/TR/CSS2/visudet.html#the-width-property
http://www.w3.org/TR/CSS2/visudet.html#the-height-property

Though Mathias and David have indicated web-presentation isn't designed  
for this, I have seen people pull this kind of thing off with client-side  
js trickery (don't remember where exactly, I think it was one of those  
AJAX word-processors), but it went something like this...

* Create an invisible DIV with the width property set to DIV "a" but the  
height property set to "auto"
* Give invisible DIV the appropriate typography settings with which you  
want to measure (font, font-points, etc).
* Populate invisible DIV a with your text. DIV should expand downwards to  
fit the text.
* Loop, removing say 100 characters from the invisible DIV until it's  
height is smaller than div "a"
* Now loop adding 1 char back til invisible DIV's height is larger than  
DIV "a"
* You've now found your sweet spot.
* Put those chars less one into DIV "a"
* Put the rest into DIV "b"

IIRC, the algo was optimised not to start with the entire text, but take a  
reasonable guess based on the size of DIV "a". It's ugly but I remember  
being surprised how well it worked.

- Edmond -

> If I could do this I could easily port erlguten to run in a browser
>
> All these seem like pretty basic things - but I can't seem to find any  
> code
>
> /Joe
>
>
>
>
>
>>
>>
>>  That's the same reason why you might want things like Erlang handling
>>> Erlang parsing, SQL handling its own escaping, etc.  If you generate  
>>> and
>>> send JS as one over the wire, you will have to double-check it  
>>> server-side.
>>>
>>
>> How about adding templating? (I suggested this to Joe off-list)...
>>
>> One could possibly use leex/yecc to compile say "std.tpl" file and  
>> access
>> it from Erlang code like so...
>>
>>  Pid ! {insert, std.grid(List)}
>>
>> which might use the content of std.tpl to produce...
>>
>>  Pid ! {insert, <<"<table>blah blah</table>">>)  % Or
>> <<"createElement(blah)">>
>>
>> which might then stream to the client...
>>
>>  "document.body.innerHTML("<table>blah blah</table>") /* Or
>> createElement/or jQuery insert call */
>>
>> The nice thing with this is, std.tpl could have versions.
>>
>> Joe likes SVG: so his std.grid(List) might produce some fancy SVG code.
>> I like XUL: so my std.grid(List) might produce "<grid>blah</grid>"
>>
>> Templating might be extendable, so you might have an app specific my.tpl
>> which extends on std.tpl...
>>
>> so when you: Pid ! {new, my.wnd(..)}
>>
>> the client gets a new page with stylesheet references, script tags, etc,
>> specified in the my.tpl
>>
>> With all the great minds on this list, surely suggestions could be made  
>> to
>> turn this early one-paged code into something more and more useful??
>>
>> - Edmond -
>>
>>
>>
>>  Unless you're running with node.js, that's going to be annoying for no
>>> good reason.
>>>
>>
>>
>>> For CSRF, There is likely no incidence at all. It's a question of  
>>> shared
>>> data between the server and HTML forms. How that data gets there is not
>>> really important at first. I could be wrong on that one though and it  
>>> might
>>> be worse than what I expect. For SQL injection, it's all about the  
>>> last line
>>> of defence before sending the data to your DB engine. If you treat it  
>>> in JS,
>>> God have mercy on your application.
>>>
>>> But yeah, this little security roundup was again to comment on the
>>> 'encrypting your JS' is what you need comments. There's a safety  
>>> element to
>>> using eval, and also performance, clarity and semantic concerns to be  
>>> had.
>>>
>>>  - Edmond -
>>>>
>>>>
>>>> On Mon, 14 Feb 2011 23:43:57 +1100, Frédéric Trottier-Hébert <
>>>> fred.hebert@REDACTED> wrote:
>>>>
>>>>  On 2011-02-14, at 03:35 AM, Joe Armstrong wrote:
>>>>>
>>>>>>
>>>>>> Ok so "separation of concerns" is good but having different  
>>>>>> notations
>>>>>> for expressing the concerns
>>>>>> is crazy- to make a web thing that interacts with a server you need  
>>>>>> to
>>>>>> learn something like
>>>>>>
>>>>>>       HTML
>>>>>>       Javascript
>>>>>>       CSS
>>>>>>       PHP
>>>>>>       MySQL
>>>>>>
>>>>>> And to be able to configure Apache and MySQL - other combinations  
>>>>>> are
>>>>>> possible.
>>>>>>
>>>>>
>>>>> I can agree with that. To have a functional website, you do need to  
>>>>> know
>>>>> a lot of different technologies. The web evolved organically and  
>>>>> each part
>>>>> of the problem space had its own solution developed over time.
>>>>>
>>>>>
>>>>>> Then you have to split the flow of control to many places.
>>>>>>
>>>>>> All of this is crazy madness. There should be *one* notation that is
>>>>>> powerful enough to express all
>>>>>> these things. In the browser is seems sensible to forget about css  
>>>>>> and
>>>>>> html only use Javascript
>>>>>> The only communication with the browser should be by sending it
>>>>>> javascript.
>>>>>>
>>>>>
>>>>> There should, but there isn't. The truth here is that most  
>>>>> programmers
>>>>> are awful at design. In any somewhat large setup, your backend  
>>>>> programmers,
>>>>> your designers and your integrators (the guys just handling HTML,  
>>>>> and CSS,
>>>>> maybe some Javascript) are not necessarily the same person.
>>>>>
>>>>> Right now the ring of web technologies is divided in a way that  
>>>>> makes it
>>>>> somewhat simple to have different people from different background  
>>>>> and
>>>>> knowledges to work on different part of your software. It makes  
>>>>> sense to
>>>>> have the designer or integrator to be able to change the look and  
>>>>> feel of a
>>>>> website without having to play in your code and maybe mess up  
>>>>> database
>>>>> queries. Modern template engines in fact try to forbid all kinds of
>>>>> seriously side-effecting code (like DB queries) from happening in the
>>>>> templates.
>>>>>
>>>>> There should be no worry for your guy working in Javascript that he'd
>>>>> not need to suddenly learn Erlang to be able to debug your  
>>>>> application.
>>>>> Then again, this separation of concerns allows specialists to work on
>>>>> their speciality with more ease. It makes things somewhat simpler in  
>>>>> larger
>>>>> organisations, but quite painful for one-man operations. I'll tell  
>>>>> you that
>>>>> it makes a lot of sense when you know all of the tools in the  
>>>>> toolkit though
>>>>> :)
>>>>>
>>>>>  How you generate the javascript is irrelevant - by hand or by  
>>>>> program -
>>>>>> who cares. If you make it by
>>>>>> program the chances are that it's right.
>>>>>>
>>>>>>  Yes and no. Generated javascript is nearly as old as the language  
>>>>>> --
>>>>> many, many .NET apps had that kind of things. Some editors like  
>>>>> Dreamweaver
>>>>> could generate JS for you. One of the problem with this is that it  
>>>>> was often
>>>>> pure garbage, or it wouldn't work in all browsers uniformly. If you  
>>>>> can
>>>>> manage to generate and capture complex behaviours in a compliant  
>>>>> manner, all
>>>>> the better. I have myself lost much hope with regards to that though.
>>>>>
>>>>>  Security is orthogonal to this - send encrypted js over the wire and
>>>>>> make sure your key-rings are secure
>>>>>> this is a completely different problem.
>>>>>>
>>>>>
>>>>> This is only transmission security. Encryption has nothing to do with
>>>>> Cross-Site Scripting (XSS, where some user is able to run arbitrary  
>>>>> JS in
>>>>> your page for you and ends up stealing information), Cross Site  
>>>>> Request
>>>>> Forgery (CSRF, where the attacker uses the fact your application is
>>>>> forgetting about things like the origin of the queries to hijack the
>>>>> client's session in their place. This is related to Same Origin  
>>>>> Policy
>>>>> issues and not easy to handle), SQL injection, overwriting some  
>>>>> parameters
>>>>> because you don't fetch them in the right order server-side (see  
>>>>> problems
>>>>> with the $_REQUEST variable in PHP), etc.
>>>>>
>>>>> 1. XSS
>>>>> XSS is, as mentioned above, the ability to run abritrary JS on a  
>>>>> page.
>>>>> This is the risky thing with your eval.
>>>>> http://en.wikipedia.org/wiki/Cross-site_scripting contains many  
>>>>> details
>>>>> on understanding the related issues. It's not always a simple matter  
>>>>> of
>>>>> escaping. Some more advanced attacks even rely on string encoding to  
>>>>> make
>>>>> sure your escaping fails. See
>>>>> http://www.governmentsecurity.org/forum/index.php?showtopic=18105.
>>>>>
>>>>> 2. CSRF
>>>>> CSRF is a tricky thing. Because HTTP doesn't support sessions, over  
>>>>> the
>>>>> years, the guys from Netscape (back then) or Opera (or whoever)  
>>>>> ended up
>>>>> using Cookies to share data on every query. What happens there is  
>>>>> that on
>>>>> every query the browser sends to a server, it also packages the  
>>>>> cookies
>>>>> neatly in the headers -- no matter what page you were on when they  
>>>>> were
>>>>> sent. The issue here is that the server might not check from what  
>>>>> page the
>>>>> call is coming from.
>>>>>
>>>>> Basically, if twitter had an URL call such as
>>>>> http://twitter.com/tweet/add?message=SomeMessageHere that would
>>>>> automatically add a tweet from your account and I put that link in  
>>>>> an image
>>>>> tag on some site, every time you would load that image, you would
>>>>> automatically make a call to the server, your browser sending in your
>>>>> cookies and making it look like YOU actually made that call, even if  
>>>>> you
>>>>> didn't know. In this case, the request is especially easy to do  
>>>>> because
>>>>> twitter would be using GET parameters to have side-effects on the  
>>>>> server. By
>>>>> forcing people into using POST, you can make things harder, but not
>>>>> impossible.
>>>>>
>>>>> One way to work again POST is using a fake website -- let's say I use
>>>>> learnyousomeerlang.com. On my own site, I'll be putting a fake
>>>>> javascript form inside an iframe (so that the page doesn't refresh  
>>>>> when
>>>>> submitted) and have the script automatically send in the POST form.  
>>>>> Now I
>>>>> send the link to my trick page over twitter and everyone who clicks  
>>>>> on it
>>>>> from there will be guaranteed to have their session open and sending  
>>>>> in
>>>>> data. I've in fact used this trick to have the site owner at my old  
>>>>> job to
>>>>> close his own admin account on his own website so he could realise  
>>>>> the
>>>>> importance of the threat.
>>>>>
>>>>> How can you solve this one? Well there are a few ways -- for one you
>>>>> could check the HTTP referrer, but that won't work everywhere -- if  
>>>>> you
>>>>> expect calls from flash, it doesn't always send these elements of  
>>>>> the HTTP
>>>>> header. In the case of HTTPS, depending on how you handle things,  
>>>>> the header
>>>>> might not always be sent either so you can't know for sure. Better  
>>>>> than
>>>>> that, if I'm using the <img> trick on your own website (on twitter,  
>>>>> for
>>>>> twitter users), the domain will be the same, without you being able  
>>>>> to check
>>>>> for anything.
>>>>>
>>>>> The only foolproof way to do this is to use what they call 'tokens':
>>>>> each call you make to the server has to have a unique piece of data  
>>>>> that the
>>>>> server knows about that can prove that the call you just made comes  
>>>>> from
>>>>> you, but also from your own forms on your own websites. These tokens  
>>>>> should
>>>>> have an expiration time and be hidden from plain view, submitted
>>>>> automatically with any form. If you don't have this, your  
>>>>> application might
>>>>> not be safe.
>>>>>
>>>>> This has *nothing* to do with encryption, and everything to do with  
>>>>> not
>>>>> understanding the potential threats of the web correctly. It is an
>>>>> application-level issue, much like XSS is. And it's pretty damn  
>>>>> important.
>>>>>
>>>>> 3. SQL injection is a different beast, where you do not properly  
>>>>> escape
>>>>> the parameters of a request going to the database, letting your run
>>>>> arbitrary DB calls. Erlang with Mnesia doesn't have to worry about  
>>>>> that, but
>>>>> Erlang with any SQL has to, even if you end up using QLC (it depends  
>>>>> on the
>>>>> library at the back in this case though, and is generally safe  
>>>>> enough).
>>>>> http://en.wikipedia.org/wiki/Sql_injection has sufficient details.
>>>>>
>>>>> 4.You have to consider that sometimes these attacks are combined
>>>>> together to be able to really do damage.
>>>>>
>>>>> I haven't even covered using weak hashing for passwords, bad security
>>>>> policies on cookies, opening files on dynamic paths without  
>>>>> filtering the
>>>>> input, etc.
>>>>>
>>>>> Web application security is not a joke and it's certainly not easy.  
>>>>> It's
>>>>> a very serious thing and most developers get it wrong at one point or
>>>>> another. Wordpress got it wrong, Twitter got it wrong, facebook got  
>>>>> it
>>>>> wrong, Google got it wrong, and so on, even though they're supposed  
>>>>> to be
>>>>> leaders in the field. Most of them got it wrong more than once too.  
>>>>> This is
>>>>> why I kind of support a 'paranoid' line of thought.
>>>>>
>>>>> --
>>>>> Fred Hébert
>>>>> http://www.erlang-solutions.com
>>>>>
>>>>>
>>>>
>>>> --
>>>> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>>>>
>>>
>>>
>>
>> --
>> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>>

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/