Saturday, December 31, 2016

Information Security: Data Flow Diagrams

Continuing from my last post on STRIDE and threat models, I will discuss Data flow Diagram (DFD) in this post. Data flow Diagrams are useful tools used in threat modeling, that capture system detail and data flow information in common element in a standard format. At a high level, it can be used via  0 GUI for end user while saving the threat model to file system.

Following are the 4 major components of a DFD:
  • Process Elements
    • Any element that is an active processor of data
    • These elements are the focus of DFD
    • There must be a sensible breakdown of system processes
    • Represented by Single circle
  • External Interaction Elements
    • These refer to the original points of interaction, and also the final destination
    • For Example: valid end users, authenticated 3rd party systems etc
    • These interactions can input, read and update (CRUD operations) on data
    • While these are similar to Process Elements, these are beyond the control and scope of the system developers
    • It is important that the threat model contain summary of the interactions with these,
    • A list of security promises made by the elements should also be documented
    • The role that each interactor plays must also be captured in the threat model
    • Represented by rectangles
  • Data Store Elements: Passive Storage
    • These elements represent that data is being stored, however, no computation is being performed on it, and hence, no processing happens here
    • Represented by horizontal parallel lines (imagine a rectangle with vertical side missing)
  • Data Flow Elements
    • These elements represent inputs and outputs between different other elements 
    • Represented by single directional arrows
Representation of different elements


Apart from the above, following representations are also used based on the complexity at hand:
  • Boundaries
    • The boundaries represent privilege differentials between different elements where applicable
    • A boundary may be of multiple types:
      • Process boundary: cross process data flows on same machine
      • Machine boundary: cross process data flows across a network
      • Trust boundary: Emphasize a privilege in general case
      • Other boundary: generic
    • Represented by dotted red line
  • Multiple Process Element
    • Sometimes, in the interest of space and ease of representation, multiple processes may be abstracted at the higher level using this element
    • Each multiple process element has a separate DFD
    • Each such element represents sub components using multi process elements
    • Represented by concentric circles

Any system of sufficient complexity will require multi level DFDs. Due to which, the DFD can become very complex and hard to read. In such cases, context diagrams are used to present information in an easy to understand manner.

Context diagrams are useful to divide systems based on physical realities. For example, the context diagram may be based on major components (rather than high level logical diagram), so that they are  functionally comprehensible. Grouping into logical divisions is helpful in case of multiple ways to divide because of similar realities

Thus overall, a major system can follow the DFD organization as:
Context diagram -> L1 diagram -> L2 diagram and so on at each element
The engineers should ideally continue to decompose the system, till relevant security detail is achieved in order to think of a DFD as complete. For example, for a webapp, it could mean not caring about how the data is stored within your MySQL internals, as long as it reaches the database.

Sunday, December 25, 2016

Information Security: Threat Models and STRIDE

I began writing about Information Security concepts few months back, and this post follows the last post I had on the topic - Information Security: Output Encoding and Error Messaging. In this post, I'm going to talk about Threat models, what they are, and how to build one.

Now, when we talk of threat models, it is important to change perspective to an attacker and then visualize the software. In most projects, Software Engineers are assigned work based on functional groupings. Systems are usually partitioned at each level of complexity for easier assignment to developers. It thus becomes important to communicate assumptions and manage complexity through contracts and specifications.

Many methods like UML diagrams (Unified Modelling Langugage) exist for communicating designs. The key to building large scale systems is abstraction of every component and principles involved, and in order to build secure apps, one needs to make sure that all members are aware of the assumptions involved if any.

Moving on to threat models, it is important to be aware of the key terms involved, and how they differ from each other:
  • Assets – It could refer to the users or systems or information or anything that can be assigned a value. An asset represents what needs to be protected.
  • Vulnerability – A vulnerability is simply a design or implementation flaw that can be exploited by an attacker to gain unauthorized access to an asset. A vulnerability represents the weakness in our protection efforts.
  • Threat – Anything (person, program, bot etc) that may exploit a vulnerability, intentionally or accidentally. The attackers aim could be to obtain, damage, or destroy asset(s). A threat represents what needs to be protected against.
  • Risk – The potential for loss, damage or destruction of an asset which results from a threat exploiting a vulnerability. Risk is the intersection of assets, threats, and vulnerabilities.

In layman terms, the 4 entities above are related by the equation
A (Assets) + T (Threat) + V (Vulnerability) = R (Risk)
Since attackers want systems and assets, they usually target the weakest links in chain. Your overall system is only as secures as the weakest links after all, and therefore, we need to understand ways in which we can be targeted. It is required to capture important threats, and document mitigated threats. One needs to create negative use cases - for example, what happens when something that is supposed to happen doesn’t happen, and something that isn't supposed to happen actually happens.

And so we arrive at threat modeling, a process by which potential threats, such as structural vulnerabilities can be identified, enumerated, and prioritized. STRIDE is a threat classigication model developed at Microsoft for thinking about Computer Software Vulnurabilities:
  • Spoofing: Example: spoof identity and impersonate
  • Tampering: Tamper data in transit or at rest
  • Repudiation (denial of truth or validity of something): repudiate an action - actions that will be harmful to the flow
  • Information disclosure: disclosing of secret data, the flip side of tampering
  • Denial of service: Restricting the usage by legitimate users by a malicious actor
  • Elevation of Privilege: end goal of most attackers - change the interpretation of data

To further help engineers understand mitigation strategies required, Microsoft also came up with Elevation of Privilege game - Go through each of STRIDE one by one, and try to see if there are any vulnerabilities that can be targetted.

In the next posts, I'm going to talk about Data Flow Diagrams, DREAD, and mitigation of threat models.

Sunday, November 20, 2016

Principles for great mobile app development

For the past few days, I've been reading up on mobile development principles, and how the user behaviours are different on mobile as compared to web. Though I manage product roadmaps for Android and iOS apps for hoteliers at Goibibo, I feel that the mobile world is so fast that it makes most learning obsolete in a matter of months. As such, the objective of this post is to condense various mobile development principles that I believe will be relevant years from today.

Given that the touch screen mobiles have been  around for only 10 years, it is surprising the kind of behavioural changes mobile is bringing to human interactions. From notifications for new messages to instant gratification on ecommerce apps to making yourself more healthy using the plethora of apps, mobile is touching us all in a way that technologies of the past weren't able to. It is hard to imagine that a machine 20X smaller and 10X faster than the desktop computer I was using 10 years ago is in the hands of the larger majority of people. If anyone in 20007 would have told me that this would be the case in 2016, I wouldn't have believed him.

But anyway. Much of the greatness of mobile is in the app ecosystems that have built around the 2 dominant app development platforms - Android and iOS. So here go some principles that I believe will continue to be applicable to mobile development:
  1. Support snackable usage
    Given mobile has limited real estate, users can go through a lot more content with higher attention span as compared to web. Users can also check your application multiple times a day as per convenience. Hence, design the applications knowing that the mobile session will be shorter than desktop session on the average, but still more engaging.
    Make your interactions bite sized - so they stand out as a complete experience, and enable completion of tasks in small amount of time.

  2. Prefer flows over static information
    While web pages are easiest way to stuff more information, mobile suffers from latency and lack of real estate if there is too much information. So, a better measure of the efficiency is how long it takes to complete a task. It is thus important to maintain a fluid state of mind for the user, and ensure that s/he isn't distracted often

  3. Ensure that your users keep going
    While anyone can hit a dead end, you don't want your users to think they did something wrong. So as an important principle - don't ask them to do same task again, and let them continue. Any flow that has a dead end should be made more fluid and natural.

  4. Use familiar patterns
    Your users already interact with multiple mobile apps on a day to day basis, making it easier to experience your app better. There is a reason why developers still use php for new web development - there is a good amount of users of it, so the familiarity and understanding is good enough. Using existing patterns ensure that your users have to learn lesser number of things, making your apps all the more convenient.

  5. Create effortless and delightful experience
    There has to be a healthy amount of focus on ensuring retention of existing users, and it is thus important to ensure that the flows that a user experiences are delightful (leads into a reward) and effortless (doesn't require one to think too much). It is important to use analytics like crazy to identify where your users drop off.

  6. Utility > Design
    If you ever come across a situation where you have to choose between utility and design, choose utility. Great design is all about being naturally convenient, supporting users to achieve their work with minimal friction. If you come across such a situation where there is clash, it is worth it to rethink your design strategy and how you want to present the feature. This is not a license to completely ignore design - it is just about focusing on the utility of the product, because that is what drives the long term retention. If a cake doesn't taste good, no amount of icing or decoration will save it.

Hope that helps.

Saturday, June 25, 2016

Information Security: Output Encoding and Error Messaging

Continuing from my last post on importance of input validation in information security, I will discuss Output Encoding and Error Messaging for security with you, from the point of view of system development. Again, for the sake of ease of reading, this is structured as a set of question and answers that follow a conversation:
 
  • What is output encoding?
    Output encoding is the process of converting information or instructions being outputted by a component or a service into a particular format. The underlying idea is that the output once emitted may have to go through multiple intermediate hops before reaching the final destination, and having it properly encoded leaves little scope for attacks.

  • What happens due to improper output encoding?
    Improper output encoding and the lack of it is the root of implementation flaws, since data that is safe in one context may not have been encoded for safety in another context. It is important to remember that most interpreted language and structured documents formats like HTML or XML contain information in the specified structure, which can break

  • What happens if the structure is not adhered to?
    When the structure us not adhered to, injection flaws can occur during the time data is passed from an interpreted structure to another system. This gives scope to vulnerabilities like XSS, CSRF attacks, and other such threat vectors.

  • How to ensure that output is properly encoded?
    For proper encoding, it is important to study the structure of your output, and identify the troublesome characters used to breakout from data into structure. Delimiting characters in strings, and escape characters - used to encode dangerous/incompatible characters are high on the list. Any other delimiters introduced by the systems and contracts should also be checked for proper encoding.

  • How should one proceed with encoding?
    Encoding of troublesome characters should always be done using standard library. It is a 
    best practice to encode anything that may be dangerous in nature, i.e, anything that is not alphanumeric.

  • Are the any testing strategies for proper output encoding?
    Yes, definitely. You should be thoroughly testing the output encoding using 
    unit test validation and encoding functions. Every input field - hidden or intentional needs to be tested in the test cases. It is recommended that these tests be done on a regular basis, since downstream and upstream systems may change regularly.

  • What is meant by Error Messaging?
    Typically, hackers and attackers gain additional insights based on the kind of error messages the system provides them. It is therefore important to provide only the relevant information, keeping the messaging generic enough to not impart any additional information to an attacker. For example, consider the use case of recovering an account by email. In this case, either the entered email Id may have an account, or it may not. In such cases, a system can display the message  "Instructions successfully sent to email Id" and "Account does not exist" respectively. However, a generic message on the lines of "If an account exists, instructions will be sent to mail Id" works better as it does not provide any additional ammo to the hacker.

  • How do we repel attackers effectively?
    To repel attackers, give no additional ammunition, like the example above described. 
    Attackers focus on finding exceptional conditions within the system that they can exploit, and the text of error messages helps users, developers, and attackers alike. In practice, this translates into gracefully handling all exceptions and making error message generic

  • So what is the right time to do proper Error Messaging?
    Having the right Error Messaging is a design time problem. It is important to 
    define error codes or exceptions that your systems or services can emit early on during the design of your components. It is equally important to define conventions on when and how errors are to be caught, and ensuring that they will always be managed via extensive code reviews. Another best practice is to ensure errors are always handled in a consistent fashion.
So far, we have covered how to make sure that a) our systems get validated inputs, b) they emit encoded outputs and c) they don't provide additional information to attackers by having the right error messaging. In the next few posts, we will look at threat models.

Sunday, June 12, 2016

Information Security: Input validation

I've realized working at startups, security is one of the last things on a developer's mind. As long as the system isn't hackable, it doesn't matter. So recently while researching more on security, I came across a certification called as CISA, or Certified Information Security Auditor. It is an industry standard certification issued by ISACA for the people in charge of ensuring that an organization's IT and business systems are monitored, managed and protected.

Reading up more on the examination, I came across certain information security concepts that I believe are applicable to any system design in general. So I will discuss them in detail in this series of posts on Information Security.

Let us begin with input validation - it is the first line of defense that you have against any threatening actor for your systems and services. To make the post more readable for anyone new to the topic, I've structured it as a list of questions and answers that follow a conversation:
  • The first question that arises in our mind is what is Input validation?
    Input validation is the process of ensuring that data that has been passed is both correct and useful for the purpose for which it is being collected.

  • Why is input validation important?
    Input validation is important, because when not done right, it opens applications vulnerable. Exploits like buffer overflow, directory traversal, cross-site scripting and SQL injection are just a few of the attacks that can result from improper data validation.

  • Where should we validate input?
    Usually, folks confuse on where to validate the inputs correctly - on the client side or on the server side. It is important to remember that Java Script can be disabled on the client side, and thus, it is best to validate your inputs both on client and server.

  • What data should be validated?
    It is important to validate all data received from a user. While the average user may not be malicious, remember that they may be accessing your products and services from a compromised system or network. This means, all Form data, Hidden fields, Cookie data, HTTP headers and anything else of importance in general within the HTTP request should be validated.

  • What all should we validate from the input?
    It is important to remember that input has a meaning only when it is an interpretable format, since it may have to be transferred over the wire in a custom formats. So, therefore, it is required that both the syntax and semantics of the input are verified.

  • What all should be done while performing syntactic validation?
    For syntactic validation, it is important to
    • Identify and validate the structure of input - what all goes into it and what does not
      • The structure of any special symbols needs to be enforced
      • The input needs to have proper syntax for input
    • Standardize the encoding - it could be base64, or any custom implementation based on data being sent

  • What should happen to other inputs?
    Anything which does not pass the strict syntactical validation should be rejected. Common validations can include that the bounds are validated, numbers, text and text length are in acceptable ranges, and that dates and other data follow the format specified.

  • What should be done during semantic validation?
    Semantics mean that which relates to the meaning in language or logic. As such, one needs to not only check the structure of the data, but also the meaning of data. For example, if an API accepts dictionaries, it is important to validate that the right kind of dictionaries are being passed around, and not just with any data fields 

So, if you have done your input validations right, you are already safe from the large number of attacks that come from accepting incorrect inputs.