[erlang-questions] Two beautiful programs - or web programming made easy

Thu Feb 17 09:04:13 CET 2011

On Thu, Feb 17, 2011 at 5:26 AM, Edmond Begumisa <
ebegumisa@REDACTED> wrote:

> On Thu, 17 Feb 2011 02:34:00 +1100, Joe Armstrong <erlang@REDACTED>
> wrote:
>
>  2011/2/16 Edmond Begumisa <ebegumisa@REDACTED>
>>
>>  I'm a glass-half-full kinda guy :)
>>>
>>> I'd like to think this is the kind of community where someone can say ..
>>>
>>> "Hey, I've got this whacky idea. It's early stages but I think I'm onto
>>> something."
>>>
>>> Then in *addition* to the community saying "We'll, there are problems x,
>>> y
>>> and z you may have overlooked", the community *also* says "Possibly p, q,
>>> and r might help with these."
>>>
>>> I think if you look at what Joe's doing, it can be expanded on to get
>>> something very useful. Let's take some of everyone's concerns and
>>> actually
>>> *try* give Joe some advice on how they might be addressed.
>>>
>>> I've started inline, hopefully others can add some input...
>>>
>>>
>>>
>>> On Tue, 15 Feb 2011 07:37:44 +1100, Frédéric Trottier-Hébert <
>>> fred.hebert@REDACTED> wrote:
>>>
>>>
>>>
>>>> On 2011-02-14, at 15:17 PM, Edmond Begumisa wrote:
>>>>
>>>>  You've outlined a nice list of the top security concerns of *every*
>>>>
>>>>> web-developer generating dynamic content from the client-side (probalby
>>>>> a
>>>>> good chunk websites uploaded since 2005.)
>>>>>
>>>>> What I still don't get is why you find the generation of dynamic
>>>>> content
>>>>> from static js files acceptable while using eval for the same you find
>>>>> unacceptable. I don't see how XSS, CSRF, SQL Injection, and all the
>>>>> things
>>>>> you list are more unmanageable from static js that generates content vs
>>>>> streamed js that generates content.
>>>>>
>>>>>
>>>>>  The point of my security list wasn't that much about eval itself
>>>> rather
>>>> than contradicting the precise point that 'all you need to do is encrypt
>>>> javascript'. That's a reductionist and erroneous view.
>>>>
>>>> Generatic dynamic content from static JS files has a few advantages:
>>>> caching on the browser side, distributing the code via CDNs rather than
>>>> through your app server,
>>>>
>>>>
>>> How do AJAX sites solve this? One way is to break the "one page app" into
>>> a
>>> few pages. Maybe he could introduce a window concept that maps to a
>>> page...
>>>
>>> Pid ! {new, window(...)}
>>> .. work .. work...
>>> Pid ! {new, window(...)}
>>>
>>> There's a start.
>>>
>>>
>>>  benefiting from JIT if available (rather than calling the
>>>
>>>> compiler/interpreter each time you update content), potential static
>>>> analysis of code (even through things like JS-Lint). Also a smaller
>>>> payload
>>>> on the network and bandwidth -- if you only send in the functions to run
>>>> the
>>>> code once rather than on every call, you'll save a lot.
>>>>
>>>>
>>> I suggested before sending parts of your app in ordinary static js files,
>>> then call the functions as libraries. The benefits above will start to be
>>> felt.
>>>
>>> Hmmm.. I wonder what Nitrogen does, they might have some tricks.
>>>
>>>
>>>  In most cases, rendering the page (CSS included), running the JS and
>>>
>>>> transferring the data counts for 90% of the time a user will wait when
>>>> querying a page. Streaming JS to then evaluate it is going to be
>>>> terrible
>>>> for performances on larger scale applications.
>>>>
>>>>
>>> I dunno about this. Might be a bit of a blanket statement. Wasn't the
>>> very
>>> reason AJAX came around to INCREASE performance of larger scale
>>> applications
>>> *precisely* by streaming markup and JS for evaluation and rendering
>>> on-demand rather than all-at-once because in the latter case you normally
>>> send more than is actually needed?
>>>
>>>
>>>  There's probably more to add to the list, but that's what I can think of
>>>
>>>> in 15-30 seconds.
>>>>
>>>>
>>> Likewise, I could probably add more but these are the ideas for improving
>>> Joe's concept that I can come up with in 15-30 seconds ;) Others can
>>> pitch
>>> in.
>>>
>>>
>>>
>>>  And things like what I mentioned are not more or less unmanageable from
>>>> static files (I agree with you there), except for XSS:
>>>>
>>>> XSS is better treated in many cases by things like JS frameworks. If I'm
>>>> getting the result from some web query into JS, I will receive a neat
>>>> string, without a chance of it being wrong. If I then push this string
>>>> through my framework (say JQuery), it'll take care of doing specific
>>>> escaping of things like element attributes, element content, etc.
>>>>
>>>>
>>> What stopping you?
>>>
>>> As I illustrated in the previous mail on security, you can call JQuery
>>> from
>>> code that's being run in eval too! Joe's calling everything from JQuery
>>> to
>>> SVG libraries!
>>>
>>>
>>>  If I do it dynamically through my applications, chances are much better
>>>
>>>> I'll get the escaping wrong in Erlang (and you need to escape on more
>>>> levels) than JS, where it can be made on a per-element basis when
>>>> building
>>>> the DOM: "create a tag, add the attribute, add another one, add the
>>>> tag's
>>>> content, push it" vs. "mix and matches all these strings into hopefully
>>>> valid JS". This is even truer when you consider hacks such as Google's
>>>> UTF-7
>>>> encounter back in the day. This follows the idea that JS knows JS
>>>> better.
>>>>
>>>>
>>>>  Right, let's convert that statement from a critisism into a really good
>>> piece of advice:
>>>
>>> Joe: That code where you're manipulating the DOM, where you do
>>> ".insertElement" and such, know what? Better do that via JQuery instead.
>>>
>>
>>
>> Why? - the only reason I can think of is cross-browser compatibility.
>>
>
> Apparently, JQuery is super smart how it handles this in relation to
> possible XSS attacks ... or so I hear. Personally, I don't really see the
> difference. Frédéric seems more informed here and would be in a better
> position to explain.
>
>
>  At the moment I don't really
>> like libraries since I want to see whats going on as near to the bottom
>> level as I can get - libraries
>> obscure what's going on. My goal is understanding, and minimal lines of
>> code
>> (to aid understanding)
>>
>>
> I too would only use a client side library for things that I cannot do
> myself (like render SVG on non-SVG supported browsers) or things that are
> too tedious to do manually (like handling cross-browser quirks). But I'm
> informed on this thread that there's more to these tools like escaping
> JS/Tags/attributes, which I'm told is really hard to do correctly from
> Erlang. Again, I'm not sure about this myself, and planned on investigating
> further.
>
>
>  Right now I have several competing ideas - I could do with some informed
>> advice here.
>>
>> I'll fire off some questions:
>>
>> For Graphics:
>>
>> 1) SVG or
>> 2) HTML5 canvas
>>
>> Canvas is faster but has no support for objects, making onclick and
>> ondrag,
>> onmouseenter in a canvas
>> is a pain and probably either eats CPU or memory. Any good libraries -
>> I've
>> looked at all the well know
>> ones. I only want object support for a canvas (ie object grouping and
>> adding
>> click, move events to
>> object groups) - not lots of other stuff. This is why I'm currently using
>> SVG.
>>
>>
> Hmmm... Question: What is your higher priority? Having a canvas to deliver
> UI to your users or making the browser easier to program UI to?
>
> If you are not so bothered whether delivery is to an actual browser so long
> as it's done over the web using web-standards then I have a suggestion...
>
> SVG support in the Mozilla Framework is pretty OK (save a few features);
> the Gecko layout engine handles SVG natively so it renders pretty quickly
> since no JS wrapper libraries are required. What if I whipped up a little
> XULRunner client that you could push your UI to?
>
> Your users could download and install this generic client. At startup, it
> could popup a dialog asking them which app (i.e. server URI) they want to
> connect to, then open a new blank window that you could push all your SVG.
> One could also push HTML, XUL, etc.
>
> Sure, it's not a browser that everyone already has, so your end-users will
> have bothered to install it. But they'll have to do this only once since the
> client is generic. Besides, what you're doing doesn't really fit precisely
> into the browser idiom anyway, so maybe a special client is appropriate. And
> this XULRunner-based generic client could be compiled for and installed on
> every platform that Firefox supports.
>
>
>  For the keyboard:
>>
>> How can I get keystrokes into my program from javascript - virtually every
>> I
>> try is buggy -
>> Do I really have to sniff the browser type and fix the bugs of every
>> single
>> browser ..
>>
>>
> AFIAK, yes. Keystrokes are one of those cross-browser quirks library
> writers don't focus on. So you have to wade through the mess yourself. But..
>
> Another advantage of a generic XULRunner-based client: You won't have to
> deal with cross-browser quirks since you'll only be targeting 1 layout
> engine, 1 js engine, 1 DOM implementation, 1 etc, 1 etc.
>
>
>  Rich text. I want to do pixel exact typography
>>
>> I can make rich text by adding spans and css and stuff in the dom, but I
>> want pixel accurate
>> sizeing of spans - I want to do the following:
>>
>> define several on screen divs with absolute size and position. Link them
>> in some order. for example say
>>
>> <div id="a" style="absolute:...." next="b">
>> <div id="b" ...                            next="c">
>>
>> Then given rich text <p><span class="c1">...</span>  I want to flow this
>> into div a, so that it overflows into div b - I need to pixel exactly
>> calculate where to spit the text in order to do this.
>>
>>
> I don't think you can control the size of spans in that way. Spans are
> non-replaced inline elements, so they have no width or height properties...
>
> http://www.w3.org/TR/CSS2/visudet.html#the-width-property
> http://www.w3.org/TR/CSS2/visudet.html#the-height-property
>
> Though Mathias and David have indicated web-presentation isn't designed for
> this, I have seen people pull this kind of thing off with client-side js
> trickery (don't remember where exactly, I think it was one of those AJAX
> word-processors), but it went something like this...
>
> * Create an invisible DIV with the width property set to DIV "a" but the
> height property set to "auto"
> * Give invisible DIV the appropriate typography settings with which you
> want to measure (font, font-points, etc).
> * Populate invisible DIV a with your text. DIV should expand downwards to
> fit the text.
> * Loop, removing say 100 characters from the invisible DIV until it's
> height is smaller than div "a"
> * Now loop adding 1 char back til invisible DIV's height is larger than DIV
> "a"
> * You've now found your sweet spot.
> * Put those chars less one into DIV "a"
> * Put the rest into DIV "b"
>
>
http://stackoverflow.com/questions/118241/calculate-text-width-with-javascript

Which seems to answer my question.

Now I think I've got enough to write a decent in-browser editor that gets
quotes right.

<aside> why do in-brower editors not do quotes correctly? + the start quote
and end quote
symbols are *different* - if anybody knows of a javascript in-browser text
edit thats gets quote
right please tell me.</aside>

 /Joe

IIRC, the algo was optimised not to start with the entire text, but take a
> reasonable guess based on the size of DIV "a". It's ugly but I remember
> being surprised how well it worked.
>
> - Edmond -
>
>
>
>
>
>
>
>  If I could do this I could easily port erlguten to run in a browser
>>
>> All these seem like pretty basic things - but I can't seem to find any
>> code
>>
>> /Joe
>>
>>
>>
>>
>>
>>
>>>
>>>  That's the same reason why you might want things like Erlang handling
>>>
>>>> Erlang parsing, SQL handling its own escaping, etc.  If you generate and
>>>> send JS as one over the wire, you will have to double-check it
>>>> server-side.
>>>>
>>>>
>>> How about adding templating? (I suggested this to Joe off-list)...
>>>
>>> One could possibly use leex/yecc to compile say "std.tpl" file and access
>>> it from Erlang code like so...
>>>
>>>  Pid ! {insert, std.grid(List)}
>>>
>>> which might use the content of std.tpl to produce...
>>>
>>>  Pid ! {insert, <<"<table>blah blah</table>">>)  % Or
>>> <<"createElement(blah)">>
>>>
>>> which might then stream to the client...
>>>
>>>  "document.body.innerHTML("<table>blah blah</table>") /* Or
>>> createElement/or jQuery insert call */
>>>
>>> The nice thing with this is, std.tpl could have versions.
>>>
>>> Joe likes SVG: so his std.grid(List) might produce some fancy SVG code.
>>> I like XUL: so my std.grid(List) might produce "<grid>blah</grid>"
>>>
>>> Templating might be extendable, so you might have an app specific my.tpl
>>> which extends on std.tpl...
>>>
>>> so when you: Pid ! {new, my.wnd(..)}
>>>
>>> the client gets a new page with stylesheet references, script tags, etc,
>>> specified in the my.tpl
>>>
>>> With all the great minds on this list, surely suggestions could be made
>>> to
>>> turn this early one-paged code into something more and more useful??
>>>
>>> - Edmond -
>>>
>>>
>>>
>>>  Unless you're running with node.js, that's going to be annoying for no
>>>
>>>> good reason.
>>>>
>>>>
>>>
>>>  For CSRF, There is likely no incidence at all. It's a question of shared
>>>> data between the server and HTML forms. How that data gets there is not
>>>> really important at first. I could be wrong on that one though and it
>>>> might
>>>> be worse than what I expect. For SQL injection, it's all about the last
>>>> line
>>>> of defence before sending the data to your DB engine. If you treat it in
>>>> JS,
>>>> God have mercy on your application.
>>>>
>>>> But yeah, this little security roundup was again to comment on the
>>>> 'encrypting your JS' is what you need comments. There's a safety element
>>>> to
>>>> using eval, and also performance, clarity and semantic concerns to be
>>>> had.
>>>>
>>>>  - Edmond -
>>>>
>>>>>
>>>>>
>>>>> On Mon, 14 Feb 2011 23:43:57 +1100, Frédéric Trottier-Hébert <
>>>>> fred.hebert@REDACTED> wrote:
>>>>>
>>>>>  On 2011-02-14, at 03:35 AM, Joe Armstrong wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>> Ok so "separation of concerns" is good but having different notations
>>>>>>> for expressing the concerns
>>>>>>> is crazy- to make a web thing that interacts with a server you need
>>>>>>> to
>>>>>>> learn something like
>>>>>>>
>>>>>>>      HTML
>>>>>>>      Javascript
>>>>>>>      CSS
>>>>>>>      PHP
>>>>>>>      MySQL
>>>>>>>
>>>>>>> And to be able to configure Apache and MySQL - other combinations are
>>>>>>> possible.
>>>>>>>
>>>>>>>
>>>>>> I can agree with that. To have a functional website, you do need to
>>>>>> know
>>>>>> a lot of different technologies. The web evolved organically and each
>>>>>> part
>>>>>> of the problem space had its own solution developed over time.
>>>>>>
>>>>>>
>>>>>>  Then you have to split the flow of control to many places.
>>>>>>>
>>>>>>> All of this is crazy madness. There should be *one* notation that is
>>>>>>> powerful enough to express all
>>>>>>> these things. In the browser is seems sensible to forget about css
>>>>>>> and
>>>>>>> html only use Javascript
>>>>>>> The only communication with the browser should be by sending it
>>>>>>> javascript.
>>>>>>>
>>>>>>>
>>>>>> There should, but there isn't. The truth here is that most programmers
>>>>>> are awful at design. In any somewhat large setup, your backend
>>>>>> programmers,
>>>>>> your designers and your integrators (the guys just handling HTML, and
>>>>>> CSS,
>>>>>> maybe some Javascript) are not necessarily the same person.
>>>>>>
>>>>>> Right now the ring of web technologies is divided in a way that makes
>>>>>> it
>>>>>> somewhat simple to have different people from different background and
>>>>>> knowledges to work on different part of your software. It makes sense
>>>>>> to
>>>>>> have the designer or integrator to be able to change the look and feel
>>>>>> of a
>>>>>> website without having to play in your code and maybe mess up database
>>>>>> queries. Modern template engines in fact try to forbid all kinds of
>>>>>> seriously side-effecting code (like DB queries) from happening in the
>>>>>> templates.
>>>>>>
>>>>>> There should be no worry for your guy working in Javascript that he'd
>>>>>> not need to suddenly learn Erlang to be able to debug your
>>>>>> application.
>>>>>> Then again, this separation of concerns allows specialists to work on
>>>>>> their speciality with more ease. It makes things somewhat simpler in
>>>>>> larger
>>>>>> organisations, but quite painful for one-man operations. I'll tell you
>>>>>> that
>>>>>> it makes a lot of sense when you know all of the tools in the toolkit
>>>>>> though
>>>>>> :)
>>>>>>
>>>>>>  How you generate the javascript is irrelevant - by hand or by program
>>>>>> -
>>>>>>
>>>>>>> who cares. If you make it by
>>>>>>> program the chances are that it's right.
>>>>>>>
>>>>>>>  Yes and no. Generated javascript is nearly as old as the language --
>>>>>>>
>>>>>> many, many .NET apps had that kind of things. Some editors like
>>>>>> Dreamweaver
>>>>>> could generate JS for you. One of the problem with this is that it was
>>>>>> often
>>>>>> pure garbage, or it wouldn't work in all browsers uniformly. If you
>>>>>> can
>>>>>> manage to generate and capture complex behaviours in a compliant
>>>>>> manner, all
>>>>>> the better. I have myself lost much hope with regards to that though.
>>>>>>
>>>>>>  Security is orthogonal to this - send encrypted js over the wire and
>>>>>>
>>>>>>> make sure your key-rings are secure
>>>>>>> this is a completely different problem.
>>>>>>>
>>>>>>>
>>>>>> This is only transmission security. Encryption has nothing to do with
>>>>>> Cross-Site Scripting (XSS, where some user is able to run arbitrary JS
>>>>>> in
>>>>>> your page for you and ends up stealing information), Cross Site
>>>>>> Request
>>>>>> Forgery (CSRF, where the attacker uses the fact your application is
>>>>>> forgetting about things like the origin of the queries to hijack the
>>>>>> client's session in their place. This is related to Same Origin Policy
>>>>>> issues and not easy to handle), SQL injection, overwriting some
>>>>>> parameters
>>>>>> because you don't fetch them in the right order server-side (see
>>>>>> problems
>>>>>> with the $_REQUEST variable in PHP), etc.
>>>>>>
>>>>>> 1. XSS
>>>>>> XSS is, as mentioned above, the ability to run abritrary JS on a page.
>>>>>> This is the risky thing with your eval.
>>>>>> http://en.wikipedia.org/wiki/Cross-site_scripting contains many
>>>>>> details
>>>>>> on understanding the related issues. It's not always a simple matter
>>>>>> of
>>>>>> escaping. Some more advanced attacks even rely on string encoding to
>>>>>> make
>>>>>> sure your escaping fails. See
>>>>>> http://www.governmentsecurity.org/forum/index.php?showtopic=18105.
>>>>>>
>>>>>> 2. CSRF
>>>>>> CSRF is a tricky thing. Because HTTP doesn't support sessions, over
>>>>>> the
>>>>>> years, the guys from Netscape (back then) or Opera (or whoever) ended
>>>>>> up
>>>>>> using Cookies to share data on every query. What happens there is that
>>>>>> on
>>>>>> every query the browser sends to a server, it also packages the
>>>>>> cookies
>>>>>> neatly in the headers -- no matter what page you were on when they
>>>>>> were
>>>>>> sent. The issue here is that the server might not check from what page
>>>>>> the
>>>>>> call is coming from.
>>>>>>
>>>>>> Basically, if twitter had an URL call such as
>>>>>> http://twitter.com/tweet/add?message=SomeMessageHere that would
>>>>>> automatically add a tweet from your account and I put that link in an
>>>>>> image
>>>>>> tag on some site, every time you would load that image, you would
>>>>>> automatically make a call to the server, your browser sending in your
>>>>>> cookies and making it look like YOU actually made that call, even if
>>>>>> you
>>>>>> didn't know. In this case, the request is especially easy to do
>>>>>> because
>>>>>> twitter would be using GET parameters to have side-effects on the
>>>>>> server. By
>>>>>> forcing people into using POST, you can make things harder, but not
>>>>>> impossible.
>>>>>>
>>>>>> One way to work again POST is using a fake website -- let's say I use
>>>>>> learnyousomeerlang.com. On my own site, I'll be putting a fake
>>>>>> javascript form inside an iframe (so that the page doesn't refresh
>>>>>> when
>>>>>> submitted) and have the script automatically send in the POST form.
>>>>>> Now I
>>>>>> send the link to my trick page over twitter and everyone who clicks on
>>>>>> it
>>>>>> from there will be guaranteed to have their session open and sending
>>>>>> in
>>>>>> data. I've in fact used this trick to have the site owner at my old
>>>>>> job to
>>>>>> close his own admin account on his own website so he could realise the
>>>>>> importance of the threat.
>>>>>>
>>>>>> How can you solve this one? Well there are a few ways -- for one you
>>>>>> could check the HTTP referrer, but that won't work everywhere -- if
>>>>>> you
>>>>>> expect calls from flash, it doesn't always send these elements of the
>>>>>> HTTP
>>>>>> header. In the case of HTTPS, depending on how you handle things, the
>>>>>> header
>>>>>> might not always be sent either so you can't know for sure. Better
>>>>>> than
>>>>>> that, if I'm using the <img> trick on your own website (on twitter,
>>>>>> for
>>>>>> twitter users), the domain will be the same, without you being able to
>>>>>> check
>>>>>> for anything.
>>>>>>
>>>>>> The only foolproof way to do this is to use what they call 'tokens':
>>>>>> each call you make to the server has to have a unique piece of data
>>>>>> that the
>>>>>> server knows about that can prove that the call you just made comes
>>>>>> from
>>>>>> you, but also from your own forms on your own websites. These tokens
>>>>>> should
>>>>>> have an expiration time and be hidden from plain view, submitted
>>>>>> automatically with any form. If you don't have this, your application
>>>>>> might
>>>>>> not be safe.
>>>>>>
>>>>>> This has *nothing* to do with encryption, and everything to do with
>>>>>> not
>>>>>> understanding the potential threats of the web correctly. It is an
>>>>>> application-level issue, much like XSS is. And it's pretty damn
>>>>>> important.
>>>>>>
>>>>>> 3. SQL injection is a different beast, where you do not properly
>>>>>> escape
>>>>>> the parameters of a request going to the database, letting your run
>>>>>> arbitrary DB calls. Erlang with Mnesia doesn't have to worry about
>>>>>> that, but
>>>>>> Erlang with any SQL has to, even if you end up using QLC (it depends
>>>>>> on the
>>>>>> library at the back in this case though, and is generally safe
>>>>>> enough).
>>>>>> http://en.wikipedia.org/wiki/Sql_injection has sufficient details.
>>>>>>
>>>>>> 4.You have to consider that sometimes these attacks are combined
>>>>>> together to be able to really do damage.
>>>>>>
>>>>>> I haven't even covered using weak hashing for passwords, bad security
>>>>>> policies on cookies, opening files on dynamic paths without filtering
>>>>>> the
>>>>>> input, etc.
>>>>>>
>>>>>> Web application security is not a joke and it's certainly not easy.
>>>>>> It's
>>>>>> a very serious thing and most developers get it wrong at one point or
>>>>>> another. Wordpress got it wrong, Twitter got it wrong, facebook got it
>>>>>> wrong, Google got it wrong, and so on, even though they're supposed to
>>>>>> be
>>>>>> leaders in the field. Most of them got it wrong more than once too.
>>>>>> This is
>>>>>> why I kind of support a 'paranoid' line of thought.
>>>>>>
>>>>>> --
>>>>>> Fred Hébert
>>>>>> http://www.erlang-solutions.com
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>>>>>
>>>>>
>>>>
>>>>
>>> --
>>> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>>>
>>>
>
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>