Saturday, February 24, 2018

Big Data 101 - What Is Big Data And Why Hadoop?

Like I mentioned in one of my previous posts, I'm exploring the big data ecosystem. In this post, I will briefly talk about big data and Hadoop, and why they are needed. I'm envisioning this as a series of 4 blog posts, and here goes the first one.


What is big data?
What is huge amount of data today need not be considered big a few years from now. So, there are 3 important vectors to check upon if the problem needs a big data solution or not:
  • Volume of data
  • Velocity at which data is being generated: depends on the growth rate
  • Variety of data: structured vs unstructured vs multi factored vs linked vs dynamic
Example: site analyics, clickstream data etc can all be considered big data problems.

Big data comes with big problems
  • We need efficiency in storage since data volume/velocity/variety is high
  • Data losses due to corruption and hard disk failures get magnified when working with big data, and accordingly, the recovery strategies needs to be adapted.
  • The time it takes to analyze data also goes up significantly, thus requiring better techniques for analysis
  • Finally, the monetary cost of analysis also shoots up due to huge storage & computation needs
As such, traditional RDBMS databases don’t help
Grid computing approaches don’t help either
When the amount of data is huge, nodes may end up spending too much time in data transfer. While Grid Computing works well for high analysis with lower amount of data, it requires low level programming and thus may not prove as efficient.

Hadoop was built to overcome above shortcomings
Its key features are:
  • Is cost effective
  • Can handle huge volume of data
  • Efficient in storage
  • Has good recovery solutions
  • Is horizontally scale
  • Minimizes learning curve

So is Hadoop better than other databases?
Well, it depends on the use case. There are some use cases where RDBMS solutions like MySQL, PostgreSQL, MSSQL etc shine, and then others where Hadoop is the better alternate. In general,
  • RDBMS work exceptionally well with low volume data, while Hadoop with larger datasets
  • RDBMS models are static schema while Hadoop allows dynamic schemas
  • RDBMS can scale vertically (you can improve the process itself) but won’t scale horizontally (can’t improve performance of query by adding more nodes)
  • Database solutions require dedicated server requirements which can get more expensive quickly, Hadoop is made of commodity computers
  • Hadoop is a batch interactive system, and so can’t expect millisecond latencies. Thus for most practical purposes where you need to return a response quickly, Hadoop won’t be the ideal choice.
  • Hadoop encourages you to write data once into the storage and analyze it multiple times, while databases support both read and write multiple times.

It is important to note here that newer databases like Cassandra and DynamoDB allow huge volume of data to be processed - millions of columns and billions of columns and give RDBMS competition. They still have limitations on querying on fields other than primary and secondary index, but for many practical purposes, can replace the RDBMS variant.

So what is Hadoop?
Hadoop is a framework for distributed processing of large data sets, across clusters of commodity computers (nodes). All the nodes that we need are commodity hardware - it is enterprise grade servers with no customisation needed, and thus can be bought off the shelf as is. In the world of cloud computing, these nodes can sit inside a VPC as well.  

Hadoop has two core components:
  • HDFS (Hadoop Distributed File System): Takes care of all storage related complexities, which data goes where, replicating data. HDFS is virtual, so the local file system and HDFS co-exist
  • Mapreduce: Takes care of all computation related complexities

In the next posts, we will explore HDFS, Mapreduce and Hadoop ecosystem in detail.

51 comments:

  1. Replies
    1. Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important. Project Center in Chennai It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

      Spring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Corporate TRaining Spring Framework the authors explore the idea of using Java in Big Data platforms.
      Specifically, Spring Framework provides various tasks are geared around preparing data for further analysis and visualization. Spring Training in Chennai


      The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training

      Delete
  2. The advertisement crusades are advanced with the view to spike change rate. It very well may be any computerized crusade by means of Adwords or TV plugs. Indeed, A/B testing empowers investigating the pace of traffic-pulling and its transformation proportion. data science course in pune

    ReplyDelete
  3. I was blown out after viewing the article which you have shared over here. So I just wanted to express my opinion on Data Science, as this is best trending medium to promote or to circulate the updates, happenings, knowledge sharing.. Aspirants & professionals are keeping a close eye on Data science course in Mumbai to equip it as their primary skill.

    ReplyDelete
  4. Just saying thanks will not just be sufficient, for the fantastic lucidity in your writing. I will instantly grab your articles to get deeper into the topic. And as the same way ExcelR also helps organisations by providing data science courses based on practical knowledge and theoretical concepts. It offers the best value in training services combined with the support of our creative staff to provide meaningful solution that suits your learning needs

    ReplyDelete
  5. Thanks for sharing your valuable information to us, it is very useful.
    digital marketing course

    ReplyDelete
  6. Such a very useful article. I have learn some new information.thanks for sharing.
    data scientist course in mumbai

    ReplyDelete
  7. Really impressed! Everything is very open and very clear clarification of issues. It contains truly facts. Your website is very valuable. Thanks for sharing.

    Data science course in mumbai

    ReplyDelete
  8. Such a very useful Blog. Very interesting to read this article. I have learn some new information.thanks for sharing. know more about

    ReplyDelete
  9. Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and I hope you post again soon.
    ExcelR data analytics

    ReplyDelete
  10. I am impressed by the information that you have on this blog. It shows how well you understand this subject.
    ExcelR Business Analytics Course

    ReplyDelete
  11. Very nice blog here and thanks for post it.. Keep blogging...
    ExcelR data science training

    ReplyDelete
  12. Attend The PMP Certification in Abu Dhabi From ExcelR. Practical PMP Certification in Abu Dhabi Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The PMP Certification in Abu Dhabi.
    ExcelR PMP Certification in Abu Dhabi

    ReplyDelete
  13. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
    ExcelR data analytics courses

    ReplyDelete
  14. I have to search sites with relevant information on given topic and provide them to teacher our opinion and the article.
    ExcelR data science course in mumbai

    ReplyDelete
  15. A good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a one.
    data science course in india

    ReplyDelete
  16. An information store contains a subset of corporate-wide information that is of incentive to a particular gathering of clients.Data Analytics Course in Bangalore

    ReplyDelete
  17. I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!
    data analytics course mumbai

    ReplyDelete
  18. I am a new user of this site so here i saw multiple articles and posts posted by this blog,I curious more interest in some of them hope you will give more information on this topics in your next articles.
    Data Science Courses
    Data Scientist

    ReplyDelete
  19. Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and I hope you post again soon.
    Please check this Data Scientist Course

    ReplyDelete
  20. keep up the good work. this is an Ossam post. This is to helpful, i have read here all post. i am impressed. thank you. this is our Data Science course Mumbai
    data science course mumbai | https://www.excelr.com/data-science-course-training-in-mumbai

    ReplyDelete
  21. This is a wonderful article, Given so much info in it, Thanks for sharing. CodeGnan offers courses in new technologies and makes sure students understand the flow of work from each and every perspective in a Real-Time environmen python training in vijayawada. , data scince training in vijayawada . , java training in vijayawada. ,

    ReplyDelete
  22. I’m excited to uncover this page. I need to to thank you for ones time for this particularly fantastic read!! I definitely really liked every part of it and i also have you saved to fav to look at new information in your site.data science course
    360DigiTMG

    ReplyDelete
  23. The information provided on the site is informative. Looking forward more such blogs. Thanks for sharing .
    Artificial Inteligence course in Aurangabad
    AI Course in Aurangabad

    ReplyDelete
  24. I am genuinely thankful to the holder of this web page who has shared this wonderful paragraph at at this place
    data science course
    360DigiTMG

    ReplyDelete
  25. Your blog is splendid, I follow and read continuously the blogs that you share, they have some really important information. M glad to be in touch plz keep up the good work.
    Data Scientist Courses

    ReplyDelete
  26. Hello Admin!

    Thanks for the post. It was very interesting and meaningful. I really appreciate it! Keep updating stuffs like this. If you are looking for the Advertising Agency in Chennai | Printing in Chennai , Visit Inoventic Creative Agency Today..

    ReplyDelete
  27. wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries.
    Data science Interview Questions

    ReplyDelete
  28. Very informative article, which you have shared here about the big data and hadoop. After reading your article I got very much information and it is very useful for us. I am thankful to you for sharing this article here. best big data trainer

    ReplyDelete
  29. Attend The Data Science Courses From ExcelR. Practical Data Science Courses Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Science Courses.
    Data Science Courses
    Data Science Interview Questions

    ReplyDelete
  30. Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic. Data Blending in Tableau

    ReplyDelete
  31. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance

    ReplyDelete
  32. I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.
    data science course in malaysia

    ReplyDelete
  33. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.

    Simple Linear Regression

    ReplyDelete
  34. It is truly a well-researched content and excellent wording. I got so engaged in this material that I couldn’t wait to read. I am impressed with your work and skill. Thanks. Advanced Data Analytics Tools for Small Enterprises in USA

    ReplyDelete
  35. This is my first time visit here. From the tons of comments on your articles.I guess I am not only one having all the enjoyment right here! ExcelR Data Scientist Course In Pune

    ReplyDelete
  36. Attend The Data Science Courses Bangalore From ExcelR. Practical Data Science Courses Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Science Courses Bangalore.
    Data Science Courses Bangalore

    ReplyDelete
  37. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple linear regression
    data science interview questions

    ReplyDelete
  38. This Was An Amazing ! I Haven't Seen This Type of Blog Ever ! Thankyou For Sharing, data science online course

    ReplyDelete
  39. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. machine learning projects for final year In case you will succeed, you have to begin building machine learning projects in the near future.

    Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.


    Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.

    ReplyDelete
  40. Very interesting blog. Many blogs I see these days do not really provide anything that attracts others, but believe me the way you interact is literally awesome.You can also check my articles as well.

    Data Science In Banglore With Placements
    Data Science Course In Bangalore
    Data Science Training In Bangalore
    Best Data Science Courses In Bangalore
    Data Science Institute In Bangalore

    Thank you..

    ReplyDelete
  41. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple linear regression
    data science interview questions

    ReplyDelete
  42. I will really appreciate the writer's choice for choosing this excellent article appropriate to my matter.Here is deep description about the article matter which helped me more.
    PMP Certification Pune
    You completely match our expectation and the variety of our information.

    ReplyDelete
  43. Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!

    data science interview questions

    ReplyDelete
  44. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing
    best data analyst courses in mumbai

    ReplyDelete
  45. Attend The Data Analyst Course From ExcelR. Practical Data Analyst Course Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Analyst Course.
    Data Analyst Course

    ReplyDelete