Sunday, October 21, 2018

Book Review: Indira - India’s Most Powerful Prime Minister

There has been a lot of brouhaha in the press that current situations in India are comparable to the emergency years of 1970’s. A lot of debates, like those around right to privacy around Aadhaar, cross border surgical strikes, appointment of judges in apex court, and patriotism in an environment of mob lynchings end up in cries of increasing authoritarianism and rights of press being curbed. Intellectuals routinely associate above generalizations with the emergency era of India.
I’ve been curious for a while about the emergency and the polity of India before it, and so picking up a book about the person at the center of it all - Indira Gandhi - seemed the best option to learn more. For a contemporary reading, I picked up this book by Sagarika Ghose.
Important disclaimer: This book is a biography of Indira Gandhi, so naturally, it tries to paint a rosy picture of her achievements and choices, while shedding limited light on her shortcomings and failures. Also, the author has intermittently pushed in her own narrative sections that eulogize Mrs Gandhi - which make for a boring read because of their patronizing nature. Other than these two issues, the book makes for a good reading of the events in the mid 1960’s - mid 80’s
The book talks extensively about her multiple prime ministerial terms, and those of Morarji Desai, Nehru, and Shastri as well. It goes into details of emergency, and also sheds light on role of Sanjay Gandhi during emergency, and how he came to be. But as pointed by many, it is silent on many things like her nuclear agenda and establishment of RAW.
Overall, I think the book makes for an interesting read only if you are not aware of the topics before hand. The book would have made for a much better read if the author included more facts and happenings, and cut down on the eulogies that are injected throughout the book. But otherwise, the book provides very good outline to understand the events of the decades which overlapped with Indira Gandhi’s prime ministership. Overall Rating: 3/5

Sunday, May 13, 2018

Flipkart - Walmart deal: What Snapdeal lost, and lessons for entrepreneurs

The market is all buzzing with the Walmart Flipkart deal that happened last week. And why shouldn't it be, at $16bn USD, this is the biggest acquisition deal to have happened in India ever. The fact that the fortune #1 company is involved in it, investing in the e-commerce sector that it couldn't crack on its own, in an emerging economy like India, is only going to help the investor sentiment for Indian companies in general. But I am not writing this post to cover this deal - there is enough coverage out there already.

Note: Back of paper calculations follow, with lot of hindsight knowledge.

As I was reading through the news, I couldn't help wonder that Flipkart's valuation has almost doubled from $11.6bn in July 2017 to $22bn in less than an year. And how Snapdeal has missed the proverbial bus, spectacularly.

Here is a breakup of all funding raised by Snapdeal (all figures in USD):
$12M Nexus Venture, Indo-US (Kalaari) Venture Partners
$45M Bessemer and existing
$50M eBay and existing
$75M Softbank
$133M eBay, Kalaari Capital, Nexus Venture, Bessemer, Intel Capital and Saama Capital
$105M BlackRock, Temasek Holdings, Premji Invest and others
$647M Softbank
$500M Alibaba, Foxconn and SoftBank
$200M Ontario Teachers' Pension Plan
$17.5M (INR 113 crore) Nexus Venture
Total: $1.8 billion USD overall

Once valued at $6.5Bn, Snapdeal had been offered a USD $950M payout by Flipkart in June 2017. If Snapdeal had agreed, Flipkart could be having a better cumulative market share at around 45% compared to the 34% it has currently (at that time, Flipkart + Snapdeal stood at 37+14 % respectively). It would also have a wider reach amongst sellers, at least at around 300K vs the 100K it is believed to have currently. Snapdeal was a pure marketplace play, and it alone had 300K sellers. Finally, Snapdeal wouldn't have needed to sell FreeCharge, UniCommerce, and Vulcan Express to keep itself financially afloat. Without the sale at a steep discount of USD $60M from the purchase price of USD $400M, Freecharge would have been a readily available platform to complement UPI based PhonePe.

Given that Walmart's deal would have included private valuations of Myntra and Jabong as well within the final numbers, its anyone's guess how having Snapdeal in the clutch would have led the investors getting an even better valuation for the Flipkart group of companies. Even with a doubling up of valuation that already happened, Snapdeal's potential value would stand at ~ USD $1.8Bn today.

My optimistic guess is that the deal would have happened at a USD $25-26bn valuation if Snapdeal was also included, since having Snapdeal in kitty would have made Flipkart the market leader by a huge margin (compared to just the 5-7% lead it has right now over Amazon), with a stronger seller base, giving an ~2.3 multiplier.

In terms of the personal fortunes made, the founders of Snapdeal were reported to be making ~INR 250 Crore at the time of deal, having already made ~150 crore from previous stake sale. The Walmart deal would have meant that within an year of stay at Flipkart, they would be getting double the value they were initially receiving. Given Walmart is not keen on retaining non - core founders (Sachin Bansal to exit), it could have been an easy way out for Snapdeal founders as well. Employees of Snapdeal wouldn't have needed to be laid off, since Flipkart would have retained most of them.

Finally, the investors would stand to exactly recover the base investments they made in Snapdeal. Investors after all, like everyone else, want good return on their money. Snapdeal founders deciding at the last minute to kill the deal didn't really help anyone, probably except themselves. Its funny to see that a stake projected worth USD $450M is now being sold for INR 40 Crore, at almost 1/60th of the price when it could have been a very different story for the first investors who put their trust and money in you.

While Snapdeal isn't dead yet - it actually reported an increase in number of transactions - to me, there are 3 important lessons here for all entrepreneur's to remember:
  1. Never underestimate the deal making abilities of a Power Investor, in this case, Softbank (PowerInvestor:Investing::10XProgrammer:Coding)
  2. Good things happen to those who wait. Shortsightedness can (literally) prove costly in the startup world
  3. While coming on top at the first position is best, a position at the pedestal is still worth more than being an also-ran.
PS: I'm an ex Amazon techie, but wasn't high up in the food chain to know any of the sensitive market penetration details or strategies involved. All the content is my opinion alone, and builds from publicly available information.

Thursday, May 10, 2018

AWS Summit 2018, Mumbai

Today, I attended the AWS Cloud Summit in Mumbai. Held at the massive Bombay Exhibition Center, the event would have been attended by over 3000+ participants by my estimates. Overall, there were 6 different tracks for the talks: 
  • Build: Building on AWS
  • Scale: Scaling your AWS
  • Secure: secure your position in the cloud
  • Migrate: Migrating from on-prem to Cloud
  • Innovate: Innovation with the Cloud
  • Impact: Cloud in Public Sector & Education for digital India
Based on my current interests, I figured that Migrate would be far fetched for me, and the talks under Secure seemed completely promotional. So, I made a list of the ones that I liked from the description, and here are the talks that I attended which avoided most conflicts:

  • 11:30 – 12:10 <Scale> Optimizing Costs as You Scale on AWS
  • 13:30 – 14:10 <Build> Accelerate Business Innovation Using AWS Serverless Technologies
  • 14:00 – 14:30 <Impact> Addressing Risk and Compliance in Public Sector
  • 14:30 – 15:00 <Impact> Cloud Procurement in Public Sector - Making It Work
  • 15:00 – 15:30 <Impact> Smart Cities – The Journey Toward Greater Economic, Social & Environmental Achievement
  • 16:00 – 16:40 <Innovate> Building Engaging Voice Experiences with Amazon Alexa
  • 17:00 – 17:40 <Innovate> Data Driven Applications with AWS AppSync and GraphQL

You may notice that I attended a few talks from the Impact series as well. I felt that this track is also to mid and large sized organisations which can have elements of bureaucracy in their processes, and hence the content may be relevant. There was one more track on startups, that was in an open lounge setup (rather than conference setup for the others) - but I found the talks in it repetitive and plain copies of what the folks at AWS booths were anyway telling manually.

Since the AWS rep who confirmed my participation informed that registrations would begin at 7.45 AM, I left by 7 AM and managed to be there by 7.35 AM. Even though officially registrations were to start at 8, by the time I went in, the sweatshirts meant for first 1500 participants were over. Though there was more swag from various booths, and one at the end of the conf, so I guess it was ok.

This Summit had over 40 Companies partnering at different level. Some of the renowned ones included: Intel, Vmware, Arista networks, Dell EMC, Druva, Kaspersky Labs, mongoDB, SendGrid, SumoLogic, Talend, Knowlarity, Kuliza

Overall, I found the summit and it talks to be quite informative. My favorite talk, not surprisingly, was the first one: Optimizing Costs as You Scale on AWS. Having worked at multiple startups, and tried my hands at few ideas of my own, I believe AWS costs are something every one tries to optimise sooner or later.

Among the booths, I really enjoyed visiting the ones under innovations. These stalls featured startups which are working on next gen ideas, like Wattman by Zenatix for power consumption analysis, Imaginate for VR conferences, and Scapic for generating AR and VR content. Amongst the AWS booths, the one informing on the EdTech program was really helpful. EdTech is an AWS initiative which helps less than 5 year old EdTech startup get access to credits, communities and senior folks to make their product better. Its live only in the US right now, but will be launched in India soon, and is definitely something to watch out for.

Sunday, May 06, 2018

Alexa meetup: Designing Multimodal Skills

Yesterday, I attended a meetup on designing multimodal skills for Alexa, and in this post I'll share some of the interesting pointers from the presentation and discussion.

-> We are in the era of Voice UI

While terminals were the primary mode of interacting with computers when they were first invented in the 70's, systems have evolved over the years to support different types of interaction paradigms - from GUI, to Web, to Mobile. In one way, 2010's are are the era of Voice User Interface (VUI).

Voice comes naturally to us, and we have been using it for thousands of years for interacting with one another. Voice, is the next big computing platform

-> Cloud enables experiences that were not possible earlier

While sentient chat systems and bots have been imagined forever, our efforts used to lack earlier because of the limited computing available to the edge machine.

For example, designing an AI assisstant like Alexa broadly involves many complex steps, like:

  • Speech Recognition
  • Machine Learning based Natural Language Understanding 
    • convert user's utterances to an intent
  • Text to Speech

This was not possible earlier when all the processing was done by the device. Cloud computing enables AI like Alexa to flourish, by offloading all computing from the end device.

-> Multi Modal experiences are the way forward

Multi modal experiences refer to applications where there are multiple modes of experiencing the skill. For example, with an echo spot, your users can have both voice and visual experiences.

While the focus is always on voice first apps in case of Alexa, experiences can now also be augmented with the help of visual cues.

-> The introduction of multi modal approaches call for new design principals

While Alexa is not yet suited for cases where there are long of list of items, or complex nesting between them, there are some general design guidelines that can be followed:
  • design voice first - you just don't know if the user will have a visual feed or not
  • do not nest actions within list items - it becomes poor Voice UX
  • choose images that look great on all devices - while echo spot has a circular screen, an echo show has a rectangular screen
  • use font overrides sparingly, and markups in meaningful ways
  • a good way to design better Voice UX is to write the interactions down and read them in a roleplay - you better change it if it doesn't sound right

The presentation was followed by a hands on Alexa development session, where attendees created a fresh alexa skill for space facts, and deployed a pre-coded lambda on the cloud from the serverless repository. This was a standard JSON - in - JSON - out kind of session, which helped familiarise participants with Alexa developer portal and lambda deployement process.

The meetup ended with a presentation by team YellowAnt, who were demoing the public beta notifications feature of Alexa. YellowAnt is a chatops startup, and the gist of the demo idea is that the Alexa now supports notifications in beta. These notifications can be leveraged to ping end devops users to notify them of system updates (downtime/deployments done etc).

However, given that Alexa is a voice first ecosystem, it was very interesting to hear the Alexa AI pronounce lengthy text and URLs character by character, and try reading multiple notifications one after the other. All this would have made sense as strings over email/chat notifications, but ended up loosing all the context when delivered via voice. To me, this re-emphasized the need to design voice first applications with Alexa.

Overall, I found the meetup very helpful in understanding the Alexa ecosystem, and learnt a lot of cool new things.

Friday, May 04, 2018

Book Review - Trump: The Art Of The Deal

While going through some news a couple of days back, I came over the news article on a US team visiting China for trade negotiations. These talks have been necessitated due to the mutual embargoes US and then China placed on trade from each other. In line with Donald Trump's many other statements, actions and policies which fly in the face of conventional procedures and wisdom, the unilateral move by US in March to impose trade sanctions on China had left most analysts dumbfounded.
This, coupled with his anti immigration policies around restricting the H1-B visas and the associated restrictions on the EAD (Employment Authorisation Document), which are supposed to hit the Indian IT workforce hard piqued my interest. A simple question arose: why is this guy, Donald Trump, who was much vilified by US media during the elections and afterwards, able to take such an unconventional decision? (It is only recently that the negative PR he receives has started going down, and he is getting mainstream credence, due to the possibility of North Korea's denuclearization).

Going through the list of books that could help me here, I zeroed down on Trump: The Art of the Deal since he is has credits for the book, and it would contain information to his business and personal lifestyle. The guys at Amazon delivered the book quickly, and as soon as I got my hands on it, I was lost in reading it. The book has 14 chapters. It begins by recounting a week in his office (~1980's), where he gives out details of business calls he has made and received, and a gist of each of those calls. Now this is a very fascinating chapter, not because its about Donald Trump, but because it offers an example of the topmost guys at the foodchain spend their time doing business. As a lay person, I've never come across anything similar - which describes how a top executive works day in day out, with much juicy details in there. From the second chapter onwards, he starts talking about his business principles, and his business dealings. Some of his observations regarding politicians and rich people are spot on. Though the book has a solid start, I found the book to become unceremonious slowly with time. There is a lot of talk about business transactions, some of which could point be thoughtful of as boastful, and bordering on bullying.
In any case, I found the book a good read, some of the incidents it narrated were really insightful (not about Trump, but the wealthy folks per se). I think it is  definitely something worth checking out. Overall rating: 4.5/5  

Thursday, May 03, 2018

Stanford Log-linear Part-Of-Speech Tagger

Another day, another requirement. I was looking for projects, when I came across one project asking for an integration with Stanford NLP POS Tagger. So here were 4 big words, about which I obviously needed to do some search on to understand them in detail.

A google search for the exact term gave me the page to Stanford Natural Language Processing Group's site, which had this to say:
A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'.
Digging a step further, it seems this comes already pre shipped with the nltk package. From the downloads section:
Python: NLTK (2.0+) contains an interface to the Stanford POS tagger.
And this is the package called as the Stanford Log-linear Part-Of-Speech Tagger.

Why the Log-Linear? Well, from wikipedia:
A log-linear model is a mathematical model that takes the form of a function whose logarithm equals a linear combination of the parameters of the model, which makes it possible to apply (possibly multivariate) linear regression.

I think freelancing is a good idea once in a while - it helps one come across a multitude of technologies, and even basic reading on them helps one grasp the direction industry is moving in general. 

Monday, April 16, 2018

Offensive Security Certified Professional (OSCP)

I've been freelancing via various sites for some time now. Today I came across a listing which asked for an OSCP certification as a requirement. Initially, I thought this was some Oracle certification, but a quick google search revealed it is quite different: OSCP stands for Offensive Security Certified Professional. 

Going through the website, OSCP is offered by an organization with the name Offensive Security. The OffSec are a group of folks who are heavily invested in information security and penetration testing. They run a bunch of courses, and certification, one of which happens to be the OSCP.

I couldn't find any industry accreditation inormation though, so a quick wikipedia search had this to reveal:

Offensive Security Certified Professional (OSCP) is an ethical hacking certification offered by Offensive Security company that teaches penetration testing methodologies and the use of the tools included with the Kali Linux distribution (successor of BackTrack). 
In 2015, the UK's predominant accreditation body for penetration testing, CREST, began recognising OSCP as equivalent to their intermediate level qualification CREST Registered Tester (CRT).

That was surely interesting to know :) 

Saturday, March 17, 2018

Book Review: The Colour of Magic and Purple Cow

Few days back, I had been to Blossoms, the used books store on Church Street. I picked up a copy of The Purple Cow and The Colour of Magic, which I recently finished reading. So, here goes the review for both of them:

The colour of magic is the first book in the Discworld series of books by Sir Terry Prachett. While I don't intend to read all 41 books in the series (just getting hold of each of them would be a costly affair for me in India), I have always been curious about Prachett's writings and finally picked up this book. The book follows the story of its hero Rincewind, who is actually a failed wizard and tries his best to be not a hero, and the adventures he has with the tourist TwoFlower, who doesn't have any sense of money or danger. The Discworld is set on a disc which rests on four elephants, who rotate the disc on the top of a giant turtle. The Discworld has its own system of magic and physics. The book is a classic work - it is full of imagination, comedy and goofyness. The story grows from one misfortune to another that Rincewind and Twoflower manage to survive, with occasionally Death failing at getting Rincewind convinced to die. The only spoilsport - the book ends with a cliffhanger, and thus it so makes you want to read the next book. Overall: 4.5/5

Purple Cow is a book by Seth Godin where in he talks about transforming your business by being remarkable. Overall, I found the book interesting, though there were definitely avenues where I would have liked to learn more. His analysis of how product launches and marketing worked in the Television era are spot on, and make for an interesting read on why they aren't as relevant today. Though, having been written in early 2000's. I did miss ideas related to internet and marketing over it. For example, he obviously doesn't talk of anything social media, and how virality is at times induced by the various social platforms of the day. One of the best part of the book is, that the book is full of very meaningful questions, that one can ask oneself so as to understand whether or not they are being remarkable. And most of these questions apply to both physical and internet products which any business can come to use. However, the book does leave you wishing for more since it only talks about ways in which you can identify if your product is remarkable - It is silent on ways and methodologies that one can use to come up with remarkable products. And to some extent, it borrows a lot of ideas from other books, so if you've already those books, it can come across as repetitive. Overall: 3/5

Saturday, February 24, 2018

Big Data 101 - What Is Big Data And Why Hadoop?

Like I mentioned in one of my previous posts, I'm exploring the big data ecosystem. In this post, I will briefly talk about big data and Hadoop, and why they are needed. I'm envisioning this as a series of 4 blog posts, and here goes the first one.

What is big data?
What is huge amount of data today need not be considered big a few years from now. So, there are 3 important vectors to check upon if the problem needs a big data solution or not:
  • Volume of data
  • Velocity at which data is being generated: depends on the growth rate
  • Variety of data: structured vs unstructured vs multi factored vs linked vs dynamic
Example: site analyics, clickstream data etc can all be considered big data problems.

Big data comes with big problems
  • We need efficiency in storage since data volume/velocity/variety is high
  • Data losses due to corruption and hard disk failures get magnified when working with big data, and accordingly, the recovery strategies needs to be adapted.
  • The time it takes to analyze data also goes up significantly, thus requiring better techniques for analysis
  • Finally, the monetary cost of analysis also shoots up due to huge storage & computation needs
As such, traditional RDBMS databases don’t help
Grid computing approaches don’t help either
When the amount of data is huge, nodes may end up spending too much time in data transfer. While Grid Computing works well for high analysis with lower amount of data, it requires low level programming and thus may not prove as efficient.

Hadoop was built to overcome above shortcomings
Its key features are:
  • Is cost effective
  • Can handle huge volume of data
  • Efficient in storage
  • Has good recovery solutions
  • Is horizontally scale
  • Minimizes learning curve

So is Hadoop better than other databases?
Well, it depends on the use case. There are some use cases where RDBMS solutions like MySQL, PostgreSQL, MSSQL etc shine, and then others where Hadoop is the better alternate. In general,
  • RDBMS work exceptionally well with low volume data, while Hadoop with larger datasets
  • RDBMS models are static schema while Hadoop allows dynamic schemas
  • RDBMS can scale vertically (you can improve the process itself) but won’t scale horizontally (can’t improve performance of query by adding more nodes)
  • Database solutions require dedicated server requirements which can get more expensive quickly, Hadoop is made of commodity computers
  • Hadoop is a batch interactive system, and so can’t expect millisecond latencies. Thus for most practical purposes where you need to return a response quickly, Hadoop won’t be the ideal choice.
  • Hadoop encourages you to write data once into the storage and analyze it multiple times, while databases support both read and write multiple times.

It is important to note here that newer databases like Cassandra and DynamoDB allow huge volume of data to be processed - millions of columns and billions of columns and give RDBMS competition. They still have limitations on querying on fields other than primary and secondary index, but for many practical purposes, can replace the RDBMS variant.

So what is Hadoop?
Hadoop is a framework for distributed processing of large data sets, across clusters of commodity computers (nodes). All the nodes that we need are commodity hardware - it is enterprise grade servers with no customisation needed, and thus can be bought off the shelf as is. In the world of cloud computing, these nodes can sit inside a VPC as well.  

Hadoop has two core components:
  • HDFS (Hadoop Distributed File System): Takes care of all storage related complexities, which data goes where, replicating data. HDFS is virtual, so the local file system and HDFS co-exist
  • Mapreduce: Takes care of all computation related complexities

In the next posts, we will explore HDFS, Mapreduce and Hadoop ecosystem in detail.

Sunday, January 28, 2018

Ease and Impact matrix

I've been reading up strategies for successful project management for a while now. Going through some material on what are the ideal things to do during the project initialization step, I came across another useful tool, Ease and impact matrix. A while back, I had covered Responsibiltiy Assignment Matrix, which is a relevant tool for managing different responsibilities every stakeholder role may have, and associated communication.

Coming back to Ease and impact matrix, when a new project is beginning, a lot of details are still unknown - for example, who are the stakeholders, what is the success criteria, who is responsible for delivering the project, who takes accountability of the developments and so on. At this stage, defining the scope of the project is one of the most critical pieces, as it affects all other timelines for every other part. The basic idea then, is to brainstorm amongst various stakeholders to generate the list of ideas, rank each of these ideas on the basis of ease of doing and impact they create for the organization, and have a chart visualization for the prioritization.

Now those of you who would have worked with bug tracking tools (like bugzilla, JIRA, github issues etc) would be familiar with two fields that most tickets carry - ticket type (bug, task, epic etc) and priority (low to high to blocker). These tickets help in the management of all development during the lifecycle of the project, and if one were to think of it, during any sprint, teams usually pick a mix of tickets (tech debt, critical hotfixes, regular development, maintainence, and so on) to satisfy all their stakeholders. Similarly, organizations regularly use the BCG matrix to evaluate how to invest within their portfolios, based on a matrix of market share of the products and the market growth rate. The common theme with any such matrix solution remains picking up ideas that fall within the best quadrants.

In the case of Ease and impact matrix, this usually follows the pattern: 1) High impact, high ease of doing 2) High Impact, low ease of doing 3) low impact, high ease of doing and 4) low impact, low ease of doing. Depending on the strategic objectives and stakeholder needs, a bunch of ideas can thus be picked, making sure that the goals associated with the project can be SMART (Specific, measurable, achievable, relevant, and time bound).

To build your Ease and impact matrix, you would need to:
  1. Brainstorm with your stakeholders and sponsors to arrive at a list of ideas
  2. Identify the goal of the strategy - make sure that you what is one most important measure that would indicate failure
  3. Rank each of these ideas in terms of impact achieved on the goal - You can use a 1-10 scale to do this, or could even do estimates using t-shirt sizing of the buckets - plot this data along Y axis
  4. Further rank these ideas based on the ease of doing them, and make sure you are using the same scale here (1-10 numbers vs t-shirt sizing) as for impact - plot this data along X axis
  5. Make sure every idea/activity is mapped on the matrix
  6. Priortise within each qudrant, on basis of impact, and then ease of doing:
    1. High impact, High ease of doing: Prioritise these activities as the first to do
    2. High impact, Low ease of doing: Spend more time in planning these activities, as these may otherwise result in poor outcome because of the complexity.
    3. Low impact, High ease of doing: Do these tasks if they are a must for a higher impact task
    4. Low impact, Low ease of doing: Depriortize these activites brutally, and don’t attempt them unless they are necessary for a high impact task.
  7. In general, activities that are hard to do should be the first to be rejected by wavering support from project sponsors.
  8. Depending on the scope of the project, you can even try planning activities in 90 day blocks, while planning the remaining years at the high level only.

And thus, you would have a very relevant framework/tool, which would help you remain focussed and achive due success.

Monday, January 08, 2018

Book review: As A Man Thinketh, The Very Best of Common Man

As a Man Thinketh is a short essay, written by James Allen, and published in 1903 as a small book. The subject of the book is the power and right application of thought.

Its title is influenced by the biblical verse: "As a man thinketh in his heart, so is he". I picked this book up for reading on a whim - I was looking for some motivational quotes for a side project, when I came across a suggestion for reading this book on one of the blogs. Googling for the title brought forth stellar reviews on Amazon, and I ordered it from Amazon. The good guys at Amazon delivered it the next evening (perks of a Prime account), and I kept it in stash over the week to be read on weekend.

The book is a quick read, with 7 chapters in total. Working on a few personal projects these days, I found the overall language and content of the book inspirational - like how doubts and fears kill our chances of success, and how dreaming is important to achieve success in life. James Allen has some mesmerising observations, and I've listed 4 that I think are really profound:
  1. Thoughts of doubts and fear never accomplish anything, and never can.
  2. Men are anxious to improve their circumstances, but are unwilling to improve themselves, they therefore remain bound.
  3. Circumstance does not make the man, it reveals him to himself.
  4. Men do not attract what they want, but what they are.
Standing at 80 pages, this book is small enough to be read on a lazy Sunday evening, and yet set the right tone for the week(s) to come. Recommended for everyone.

The Very Best of The Common Man is a compilation of drawings by legendary cartoonist R. K. Laxman. The Common Man is a caricature that depicts various social and political situations from the point of view of the layman.

I got this book due to its sentimental value - while growing up, the You Said It column in Times of India was a great source of humour on the uncertainties in political and economic environment of the day.

The book has some of the better drawings by Laxman, where he has taken potshots at the political habits of politicians, and explores the pains that a common man faces in his day to day life when interacting with the government.

The book is a good collectible item for anyone who loves cartoons drawn by Laxman, but other than that, don't expect any more from it. 

Sunday, January 07, 2018

HPL/SQL: Procedural SQL on Hadoop, NoSQL and RDBMS

I admit it - I've a lot to learn when it comes to the internals of big data technologies. Even though I've worked on many data science projects at Goibibo, my knowledge of the principles involved is very limited. So when this project happened, where we needed to build some data pipelines for business needs, I found the perfect opportunity to explore the Hadoop and Hive ecosystem. This blog post is about a new tool that I came across, HPL/SQL.

Years back, the database world was ruled by relational databases, chief amongst them being Oracle. And it had this great language for writing queries - PL/SQL. From the Oracle website,
PL/SQL is a procedural language designed specifically to embrace SQL statements within its syntax. PL/SQL program units are compiled by the Oracle Database server and stored inside the database. And at run-time, both PL/SQL and SQL run within the same server process, bringing optimal efficiency. PL/SQL automatically inherits the robustness, security, and portability of the Oracle Database.
However, the explosion in information technology requirements led to a humongous amount of data being available for analysis. This gave rise to two other classes of databases:

Big data systems
Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them.
NOSQL database
NoSQL (originally referring to "non SQL") database provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
So depending on the problem at hand, engineers now had a host of RDBMS (Oracle, MySQL, Aurora etc), NOSQL (Cassandra, MongoDB, Redis etc), and Big Data systems (Hadoop) to choose from. 

This gave rise to a complex challenge - most data analysis earlier could be powered by the same RDBMS alone - example, businesses could build intelligence using stored procedures and triggers in PL/SQL on Oracle. In the world of multiple databases, engineers now have to learn multiple tools for data collection, analysis and so on - example Spark, Mapreduce, Hive and so on.

Different ecosystems and data analysis tools started coming out with their approaches to solve this problem. For example, connecting to multiple databases, custom SQL that is compatible with a covered set of databases and so on. However, the coverage has not been big so far.

To solve this complex problem, Hive comes with a language/tool called as HPL/SQL. The promise of HPL/SQL is simple - it is a hybrid and heterogeneous language that understands syntax of most  procedural SQL dialect, and can be used with any databases, running existing Oracle PL/SQL code on Apache Hive and Microsoft SQL Server, or running Transact-SQL on Oracle, Cloudera Impala or Amazon Redshift. So, you can use it as a language to write new procedural SQL with Hive, and you can use it as a tool to execute existing PL/SQL and logic that you have written for other databases.

Compared to writing code in Python, Java or Scala, HPL/SQL is much more friendly for the BI/SQL expert to run, and allows them to hit the ground running faster. It offers features like
  • Control Flow Statements
  • Built-in Functions
  • Stored Procedures, Functions and Packages
  • Exception and Condition Handling
  • On-the-fly SQL Conversion
So, if you are exploring an efficient way to implement ETL processes in Hadoop, give HPL/SQL a try. Since Hive 2.0, HPL/SQL comes built in with Hive. All you need to do is, 

When executing commands directly:

hplsql -e "SELECT * FROM src LIMIT 1" 
When running from a file:
hplsql -f script.sql
You can go through the documentation to get more details if required.

Monday, January 01, 2018

The morning verses

I've been trying to code a website for last couple of days. Being a side project, I set this up as 2017 goal for me, and targeted developing it before 1st January.

Having been working exclusively on backend for past 4 years now, the HTML, CSS, JS of the day are much different than what the libraries 4 years ago looked like. I overshot the timeline, and at around 6 am in the morning, when exhausted after triaging one of the last few UI bugs, I needed a diversion. Thankfully, two insects and the Sun provided the inspiration.

This is what I came up with - formatting it and posting now. Happy new year 2018, and hope you enjoy this short poem :)

The night it seemed would never end
as the code it seemed would never compile,
All my brain in thinking spent
while all my effort seemed futile.

Hours it took to dawn upon me
what silly errors had I made,
The classes somehow manipulated me
for all the errors could now fade.

For this respite I was thanking god
when two insects flew into my space,
Death with a slipper was their reward 
for flying and hitting upon my face. 

And then the birds started chirping out my door
while the sun began to rise in the sky,
The dead were dancing on the floor
as I started having a morning high. 

And thus I made peace with my brain
while the peace of my mind returned again,
For the morning air started refreshing me
while the sunshine took away my pain.