Saturday, February 24, 2018

Big Data 101 - What Is Big Data And Why Hadoop?

Like I mentioned in one of my previous posts, I'm exploring the big data ecosystem. In this post, I will briefly talk about big data and Hadoop, and why they are needed. I'm envisioning this as a series of 4 blog posts, and here goes the first one.


What is big data?
What is huge amount of data today need not be considered big a few years from now. So, there are 3 important vectors to check upon if the problem needs a big data solution or not:
  • Volume of data
  • Velocity at which data is being generated: depends on the growth rate
  • Variety of data: structured vs unstructured vs multi factored vs linked vs dynamic
Example: site analyics, clickstream data etc can all be considered big data problems.

Big data comes with big problems
  • We need efficiency in storage since data volume/velocity/variety is high
  • Data losses due to corruption and hard disk failures get magnified when working with big data, and accordingly, the recovery strategies needs to be adapted.
  • The time it takes to analyze data also goes up significantly, thus requiring better techniques for analysis
  • Finally, the monetary cost of analysis also shoots up due to huge storage & computation needs
As such, traditional RDBMS databases don’t help
Grid computing approaches don’t help either
When the amount of data is huge, nodes may end up spending too much time in data transfer. While Grid Computing works well for high analysis with lower amount of data, it requires low level programming and thus may not prove as efficient.

Hadoop was built to overcome above shortcomings
Its key features are:
  • Is cost effective
  • Can handle huge volume of data
  • Efficient in storage
  • Has good recovery solutions
  • Is horizontally scale
  • Minimizes learning curve

So is Hadoop better than other databases?
Well, it depends on the use case. There are some use cases where RDBMS solutions like MySQL, PostgreSQL, MSSQL etc shine, and then others where Hadoop is the better alternate. In general,
  • RDBMS work exceptionally well with low volume data, while Hadoop with larger datasets
  • RDBMS models are static schema while Hadoop allows dynamic schemas
  • RDBMS can scale vertically (you can improve the process itself) but won’t scale horizontally (can’t improve performance of query by adding more nodes)
  • Database solutions require dedicated server requirements which can get more expensive quickly, Hadoop is made of commodity computers
  • Hadoop is a batch interactive system, and so can’t expect millisecond latencies. Thus for most practical purposes where you need to return a response quickly, Hadoop won’t be the ideal choice.
  • Hadoop encourages you to write data once into the storage and analyze it multiple times, while databases support both read and write multiple times.

It is important to note here that newer databases like Cassandra and DynamoDB allow huge volume of data to be processed - millions of columns and billions of columns and give RDBMS competition. They still have limitations on querying on fields other than primary and secondary index, but for many practical purposes, can replace the RDBMS variant.

So what is Hadoop?
Hadoop is a framework for distributed processing of large data sets, across clusters of commodity computers (nodes). All the nodes that we need are commodity hardware - it is enterprise grade servers with no customisation needed, and thus can be bought off the shelf as is. In the world of cloud computing, these nodes can sit inside a VPC as well.  

Hadoop has two core components:
  • HDFS (Hadoop Distributed File System): Takes care of all storage related complexities, which data goes where, replicating data. HDFS is virtual, so the local file system and HDFS co-exist
  • Mapreduce: Takes care of all computation related complexities

In the next posts, we will explore HDFS, Mapreduce and Hadoop ecosystem in detail.

66 comments:

  1. The advertisement crusades are advanced with the view to spike change rate. It very well may be any computerized crusade by means of Adwords or TV plugs. Indeed, A/B testing empowers investigating the pace of traffic-pulling and its transformation proportion. data science course in pune

    ReplyDelete
  2. Just saying thanks will not just be sufficient, for the fantastic lucidity in your writing. I will instantly grab your articles to get deeper into the topic. And as the same way ExcelR also helps organisations by providing data science courses based on practical knowledge and theoretical concepts. It offers the best value in training services combined with the support of our creative staff to provide meaningful solution that suits your learning needs

    ReplyDelete
  3. Thanks for sharing your valuable information to us, it is very useful.
    digital marketing course

    ReplyDelete
  4. Really impressed! Everything is very open and very clear clarification of issues. It contains truly facts. Your website is very valuable. Thanks for sharing.

    Data science course in mumbai

    ReplyDelete
  5. Such a very useful Blog. Very interesting to read this article. I have learn some new information.thanks for sharing. know more about

    ReplyDelete
  6. Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and I hope you post again soon.
    ExcelR data analytics

    ReplyDelete
  7. I am impressed by the information that you have on this blog. It shows how well you understand this subject.
    ExcelR Business Analytics Course

    ReplyDelete
  8. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
    ExcelR data analytics courses

    ReplyDelete
  9. A good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a one.
    data science course in india

    ReplyDelete
  10. An information store contains a subset of corporate-wide information that is of incentive to a particular gathering of clients.Data Analytics Course in Bangalore

    ReplyDelete
  11. I’m excited to uncover this page. I need to to thank you for ones time for this particularly fantastic read!! I definitely really liked every part of it and i also have you saved to fav to look at new information in your site.data science course
    360DigiTMG

    ReplyDelete
  12. The information provided on the site is informative. Looking forward more such blogs. Thanks for sharing .
    Artificial Inteligence course in Aurangabad
    AI Course in Aurangabad

    ReplyDelete
  13. I am genuinely thankful to the holder of this web page who has shared this wonderful paragraph at at this place
    data science course
    360DigiTMG

    ReplyDelete
  14. Your blog is splendid, I follow and read continuously the blogs that you share, they have some really important information. M glad to be in touch plz keep up the good work.
    Data Scientist Courses

    ReplyDelete
  15. Very informative article, which you have shared here about the big data and hadoop. After reading your article I got very much information and it is very useful for us. I am thankful to you for sharing this article here. best big data trainer

    ReplyDelete
  16. Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic. Data Blending in Tableau

    ReplyDelete
  17. I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.
    data science course in malaysia

    ReplyDelete
  18. It is truly a well-researched content and excellent wording. I got so engaged in this material that I couldn’t wait to read. I am impressed with your work and skill. Thanks. Advanced Data Analytics Tools for Small Enterprises in USA

    ReplyDelete
  19. Attend The Data Scientist Courses From ExcelR. Practical Data Scientist Courses Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Scientist Courses. Data Scientist Courses

    ReplyDelete
  20. I have express a few of the articles on your website now, and I really like your style of blogging. I added it to my favorite’s blog site list and will be checking back soon…
    Data Scientist Courses I adore your websites way of raising the awareness on your readers. Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!

    ReplyDelete
  21. Data Analytics course PuneI am a new user of this site so here i saw multiple articles and posts posted by this site,I curious more interest in some of them hope you will give more information on this topics in your next articles.
    I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!

    ReplyDelete
  22. I am really happy with your blog because your article is very unique and powerful for new reader.data science course in Hyderabad

    ReplyDelete
  23. Very nice job... Thanks for sharing this amazing and educative blog post! ExcelR Data Scientist Course In Pune

    ReplyDelete
  24. I wanted to leave a little comment to support you and wish you a good continuation. Wishing you the best of luck for all your blogging efforts.
    a href="https://www.excelr.com/data-analytics-certification-training-course-in-pune/"> Data Analytics Course in Pune/">You re in point of fact a just right webmaster. The website loading speed is amazing. It kind of feels that you're doing any distinctive trick. Moreover, The contents are masterpiece. you have done a fantastic activity on this subject!
    I have express a few of the articles on your website now, and I really like your style of blogging. I added it to my favorite’s blog site list and will be checking back soon…

    ReplyDelete
  25. Registration management: Fully customisable online booking forms ensuring you capture the most relevant information from your delegates, Devops tech events

    ReplyDelete
  26. I can set up my new idea from this post. It gives in depth information. Thanks for this valuable information for all,.. big data courses london

    ReplyDelete
  27. He/she collects data from various sources and interprets it. Hadoop is an efficient framework for gathering a huge amount of data from open source software having networked computers. data science course syllabus

    ReplyDelete
  28. Super site! I am Loving it!! Will return once more, Im taking your food likewise, Thanks. Business

    ReplyDelete
  29. Therefore, you can opt for the right institute to take a course and gain more knowledge in the field. This will help you gain the expertise and get better at what you do. data science course in india

    ReplyDelete
  30. Through this post, I realize that your great information in playing with all the pieces was exceptionally useful. I advise this is the primary spot where I discover issues I've been scanning for. You have a smart yet alluring method of composing.
    data science course

    ReplyDelete
  31. You totally coordinate our desire and the assortment of our data.
    data science courses malaysia

    ReplyDelete
  32. Standard visits recorded here are the simplest strategy to value your vitality, which is the reason why I am heading off to the site regularly, looking for new, fascinating information. Many, bless your heart!
    data scientist course delhi

    ReplyDelete
  33. I commend you for your excellent report on the knowledge that you have shared in this blog.
    360DigiTMG data analytics course

    ReplyDelete
  34. A Non-Disclosure Agreement (NDA) will be marked or Corporate and Business Clients to guarantee privacy of information from accommodation of media to information conveyance.data recovery advice

    ReplyDelete
  35. i am glad to discover this page : i have to thank you for the time i spent on this especially great reading !! i really liked each part and also bookmarked you for new information on your site.
    cyber security training in bangalore

    ReplyDelete
  36. Nice article, thank you for sharing such a useful information. for more detailsHearing Aid shops in chennai

    ReplyDelete
  37. Thank you for sharing very useful information. for more details.
    Best Life Insurance In Dubai

    ReplyDelete
  38. There are many reasons why it's important to verify Social Security number information on your employees such as to make sure that you are hiring legal workers, ensuring accurate wage reports, and ensuring that your employees' wages are properly credited to their SSA earnings records. A simple typo could have disastrous long-term effects on an employee.https://numbersdata.com/

    ReplyDelete
  39. Creative Web Studio - The Cyber Defense Company bietet als zertifiziertes Unternehmen lösungsorientierte und zeitgemässe ICT-Services für KMUs an Hauptfokus: Cloud, IT-Security und Informatik.Forensic

    ReplyDelete
  40. Hence, the dynamic interaction for purchasing service is more mind boggling contrasted with that for buying actual items. IT company Hamilton

    ReplyDelete
  41. The information you have posted is very useful. The sites you have referred was good. Thanks for sharing.
    artificial intelligence courses aurangabad

    ReplyDelete
  42. Thank you because you have been willing to share information with us. we will always appreciate all you have done here because I know you are very concerned with our. big data definition

    ReplyDelete
  43. Thanks for Sharing this Valuable Information with us: this is very useful for me. Keep it Up.
    data science training in pune

    ReplyDelete
  44. This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.
    artificial intelligence course aurangabad

    ReplyDelete
  45. Even once data is written over, it can still be found. This is because the data that is written over it is usually imperfect, leaving traces of the original file. Melbourne Data recovery

    ReplyDelete
  46. Data science combines multiple fields, including statistics, scientific methods, artificial intelligence (AI), and data analysis, to extract value from data. ... Data science encompasses preparing data for analysis, including cleansing, aggregating, and manipulating the data to perform advanced data analysis. free data science course

    ReplyDelete
  47. This is such a great resource that you are providing and you give it away for free. I love seeing blog that understand the value of providing a quality resource for free. https://besttapestorage.weebly.com/blog/tape-stockpiling-in-singapore-what-are-your-choices

    ReplyDelete
  48. I would like to say that this blog really convinced me to do it! Thanks, very good post. https://www.meridiannorstar.net

    ReplyDelete
  49. You should mainly superior together with well-performing material, which means that see it: help with data analysis

    ReplyDelete
  50. I see the greatest contents on your blog and I extremely love reading them.
    cyber security course

    ReplyDelete
  51. This is very appealing, however , it is very important that will mouse click on the connection: Help With Data Analysis For Dissertation

    ReplyDelete
  52. Such a helpful article. Interesting to peruse this article.I might want to thank you for the endeavors you had made for composing this wonderful article.
    data science classes in hyderabad

    ReplyDelete
  53. It was a great experience after reading. Informative content and knowledgeable to all. Keep sharing more blogs with us.
    Data Science Course Training in Hyderabad

    ReplyDelete
  54. Thanks for such a great post and the review, I am totally impressed! Keep stuff like this coming.
    data analytics training in hyderabad

    ReplyDelete
  55. This is very interesting content! I have thoroughly enjoyed reading your points and have come to the conclusion that you are right about many of them. You are great. Help With Data Analysis For Dissertation

    ReplyDelete
  56. Data Science Course online training is the best choice during this pandemic. We provide individual attention with 1:1 mentorship.
    data analytics course in hyderabad

    ReplyDelete
  57. I have read many data science posts online previously, but none has managed to captivate my attention like this one. This is truly a masterpiece, and a perfect guide for all data science aspirants. Thanks to the writer for spelling out the concepts clearly, and using just the right words and structure.data science course institute in nagpur

    ReplyDelete