Saving property with HTML - encode on entry, or on display? - html-encode

I have a system which allows users to enter HTML-reserved characters into a text area, then post that to my application. That information is then saved to a database for later retrieval and display. Alarms are (should be) going off in your head. I need to make sure that I avoid XSS attacks, because I will display this data somewhere else in the application. Here are my options as I see it:
Encode before save to DB
I can HTML-encode the data on the way in to the database, so no HTML characters ever are entered in the database.
Pros:
Developers don't have to remember to HTML encode the data when its displayed on the web page.
Cons:
The data now doesn't make sense for desktop-based applications (or anything other than HTML). Stuff shows up like < > & etc.
Don't HTML encode before saving to DB
I can HTML encode the data whenever I need to display it on a web page.
Pros:
Feels right because it keeps the integrity of the data that was entered by the user.
Allows non-HTML based applications to just display this data without having to worry about HTML encoding.
Cons:
We might display this data in a lot of places, and we'll have to make sure that every developer knows that when you display this field, you'll need to HTML encode it.
People forget things. There WILL be at least once instance when we forget to HTML encode the data.
Scrub the data before saving to DB (don't HTML encode)
I can use a well-tested third party library to remove potentially dangerous HTML and get a safe HTML fragment to save the database, not HTML encoded.
Pros:
Preserves most of the original input so that display in a non-HTML format makes sense.
Less catastrophic if the developer forgets to HTML encode this information for display on a web page.
Cons:
Still messes with the data as the user originally entered it. If they really want to type a <script> or <object> tag, it won't make it, and we'll get support calls and emails because of that.
My question is: What is the best option, or if there is another way of going about this, what is it?

The right thing to do is not mangle/change user input.
So, do not encode before saving.
Yes, this puts the onus on the developers to remember and know that they need to encode anything coming out of the DB, but this is good practice regardless.

Related

Safe markup/way to display text in Web UI from external source

So, I have a web UI written in angular that calls some rest/json webservices.
Those webservices can return some plain-text data representing some kind of log.
The UI can show those plain-text data to the user. Today, this data is displayed in a <pre></pre> area that allows ASCII presentation.
However, I would like to add a way for the logs to be more rich and to provide more functionalities:
color part of the text;
space formating (like alignment);
maybe have a piece of text that is a link to another resource;
others...
So far, I've come up with:
allow the plain-text data to be html code, so that it can be displayed directly. But the possibility to script injection, so it's a no-go;
allow the plain-text to be some kind of markup language such as markdown, but this allows allows html markup, so would it be safe?;
create a new sort of json schema used to describe the now plain-text data, and rely on that to provide some formatting on the UI side. But this means having to re-invent some sort of wheel to me…
I don't seem to stumble upon the right idea right now.
Would you know of a better way to do that? Perhaps a standard format already exists that would allow me to do that?

MySQL HTML sanitization

I have a website that saves data to a MySQL database
Should I escape the HTML upon inserting it into MySQL or upon displaying it on my website?
Ideally, I'd like to input raw HTML into my database and just sanitize each time I pull from it. Is there any danger in doing it this way?
Example html:<h1>test</h1>
typically users won't save HTML, but I don't want them to be
restricted. Of course that HTML won't be executed. It will just be
displayed
Should I escape the HTML upon inserting it into MySQL or upon displaying it on my website?
Then you don't have HTML to begin with. You have plain text.
Escaping plain text to be injected in HTML is a fast operation and, unless we're talking of 1 GB worth of text in a single row, it doesn't make sense to cache it. If you convert to plain text to HTML before saving it, you no longer have the original text and you're forced to undo the encoding just to not use it in HTML context (e.g., put it in a JavaScript variable or use it as e-mail subject).

Laravel Save Markdown to Database - Don't Understand

I am reluctant to post this, but I am having trouble understanding how markdown actually "saves" to a database.
When I'm creating a migration, I will add columns and specify the type of value (i.e. integer, text, string, etc.) and in the course of operation on the website, users will input different information that is then saved in the DB. No problem there.
I just can't seem to wrap my head around the process for markdown. I've read about saving the HTML or saving the markdown file, rendering at runtime, pros and cons all that.
So, say I use an editor like Tiny MCE which attaches itself to a textarea. When I click "Submit" on the form, how does that operate? How does validation work? Feel free to answer my question directly or offer some resource to help further my understanding. I have an app built on Laravel so I'm guessing I'll need to use a package like https://github.com/GrahamCampbell/Laravel-Markdown along with an editor (i.e. Tiny MCE).
Thanks!
Let's start with a more basic example: StackOverflow. When you are writing/editing a question or answer, you are typing Markdown text into a textarea field. And below that textarea is a preview, which displays the Markdown text converted to HTML.
The way this works (simplified a little) is that StackOverflow uses a JavaScript library to parse the Markdown into HTML. This parsing happens entirely client side (in the browser) and nothing is sent to the server. With each key press in the textarea the preview is updated quickly because there is no back-and-forth with the server.
However, when you submit your question/answer, the HTML in the preview is discarded and the Markdown text from the textarea is forwarded to the StackOverflow server where is is saved to the database. At some point the server also converts the Markdown to HTML so that when another user comes alone and requests to view that question/answer, the document is sent to the user as HTML by the server. I say "at some point" because this is where you have to decide when the conversion happens. You have two options:
If the server converts the HTML when is saves it to the Database, then it will save to two columns, one for the Markdown and one of for the HTML. Later, when a user requests to view the document, the HTML document will be retrieved from the database and returned to the user. However, if a user requests to edit the document, then the Markdown document will be retrieved from the database and returned to the user so that she can edit it.
If the server only stores the Markdown text to the database, then when a user requests to view the document, the Markdown document will be retrieved from the database, converted to HTML and then returned to the user. However, if a user requests to edit the document, then the Markdown document will be retrieved from the database and returned to the user (skipping the conversion step) so that she can edit it.
Note that in either option, the server is doing the conversion to HTML. The only time the conversion happens client-side (in the browser) is for preview. But the "preview" conversion is not used to display the document outside of edit mode or to store the document in the database.
The only difference between something like StackOverflow and TinyMCE is that in TinyMCE the preview is also the editor. Behind the scenes the same process is still happening and when you submit, it is the Markdown which is sent to the server. The HTML used for preview is still discarded.
The primary concern when implementing such a system is that if the Markdown implementation used for preview is dissimilar from the implementation used by the server, the preview may not be very accurate. Therefore, it is generally best to choose two implementations that are very similar or, if available, use the same implementations for both.
It is actually very simple.
Historally, in forums, there used be BBCodes, which are basically pseudo-tags that allow you to format your text in some say. For example [b][/b] used to mean "make this text bold". In Markdown, it happens the exact same thing, but with other characters like *text* or **text**.
This happens so that you only allow your users to use a specific formatting, otherwise if you'd allow to write pure HTML, XSS (cross-site scripting) issues would arise and it's not really a good idea.
You should then save the HTML on the database. You can use, for example, markdown-js which is a Markdown parser that parses Markdown to HTML.
I have seen TinyMCE does not make use of Markdown by default, since it's simple a WYSIWYG editor, however it seems like it also supports a markdown-like formatting.
Laravel-Markdown is a server-side markdown render helper, you can use this on Laravel Blade views. markdown-js is instead client-side, it can be used, for example, to show a preview of what you're writing in real-time.

ASP.NET MVC XSS Input Field strip HTML/Scripts or Sanitize

I'm using ASP.NET MVC AntiXssEncoder to prevent XSS for INPUT fields on Regeneration Form
However, when on Update page user sees below:
Input Test <b>abc</b>
What's the best practice for this scenario?
1. Sanitize or Remove all HTML and Script Tags
Thanks.
When you threat model or build a threat profile for the application you are dealing with, it will come clear on the systems (apps, web pages) that you are interfacing and communicating with. You will get to know the places you are receiving the input from, you will get to know the places you are outputting. You can make the decision of whether
you want to sanitize the input and store it in a database
or just store malicious input (remember even if <script>script('foo')</script> is in the database it is still malicious when it gets reflected on the page) in the database, and then prevent it from being executed at the time of display in the browser or web application.
It would not be very apt to give a conclusive answer that you should sanitize the input before string it in a database (such as if the user inputs the string
Alex <script>window.location='http://evil.com';</script> then you should store only Alex and purge the malice input from the entered string, and then store it in the database) or do not worry about input sanitization, depend on output encoding.
But the best practice is to implement security in depth, in multiple
layers. Ideally, you should be doing input sanitization and output
encoding as well. Because you may never know that if you do not do
input sanitization, your malicious script in the database may get
reflected in some other application that you feed your data too.
Having said all that story, In your case I think what you want to see in the output is instead of Input Test <b>abc</b> you want to see <b>abc</b>. When you look at the response (html source) of the page then you should have Input Test <b>abc</b> which would be displayed in the browser as abc.
However if you see Input Test <b>abc</b> in the browser then I think your response (use the view source option of capture with fiddler) is &lt;b&gt;abc&lt;/b&gt; . If this is the case then you are running in to the problem of double encoding (actually double html output encoding). If you are using razor view engine, then unless you do a Html.Raw ... by default every string you output would be output encoded with the Razor view engine's # syntax. If you are using aspx view engine, then anything you put inside the script block <%: %> would be output encoded.
Your question about best practice is answered in my opinion. Please comment of edit the question if you want to know anything else (related to this question).
Sanitize on output, not on input.
It is safe to store <script>alert(foo)</script> in your database - it has no meaning there.
It is only when you output this to a HTML page that you need to encode special characters. i.e you need to output
<script>alert(foo)</script>
so
<script>alert(foo)</script>
is displayed in your page, not executed.
This way, say if you need to output to a text file you are then not printing <script>alert(foo)</script> to the page instead of <script>alert(foo)</script> which has no meaning in this context.
In your case it appears you are encoding it twice. You should remove the code that encodes it before storage, and only encode on output to the page (using <%: %> tags).

UIWebView or Core Text

New to Objective C and IOS.
I am writing an app that pulls in data via JSON from a PHP server. At a high level, that data is often in arrays that contain different types of data (headlines, body text, certain data types that require certain formatting like italics, etc.). To render a page, I plan to just walk the arrays and alter the size, formatting of text as a new data element appears (i.e., this is a headline, make it bold and bigger... this is a sub-headline, make it italics, etc.)
I will need to display the text (and pull some images in as well) on a single view. The application won't know the structure of the data until it receives the JSON (for example, when, where, or how often headlines show up). I may, or may not want to be able to capture actions from the rendered text (i.e., clicking on a headline spawning a new view, etc.)
What do people typically do? I know Core Text is out there, and, from what I've seen, it's fairly difficult to work with--and, even the tutorials produce pretty bland formatting. I've also seen indication that people just use a UIWebView and generate the HTML on the fly and just display it using HTML.
If UIWebView is the best and easiest solution, I'll probably just do that. But, I also don't want to use a technique that is frwoned upon or I will discover down the road has serious limitations. It also seems a bit strange for an app to (in parts) just be a glorified webpage. But, perhaps that's what people do given the tools that are available (and, certainly, HTML does what it does fairly well).
Thoughts?
I certainly recommend to use a UIWebView, unless you're already familiar with CoreText.
I'm currently using a web view myself to show a formatted log in my application. I've several methods which all create specific HTML code and then append the HTML to an NSMutableString. Using a UIWebView allows you to easily change content.
As CSS is fully available, generating styled pages isn't difficult at all.
JavaScript can be used too, so scrolling your view is just as easy as telling the web view to execute some JavaScript. And of couse the scripting language brings you a lot of other features, too.

Resources