the implications of a techie

So after taking two years of required cs classes, I am now finally ready for the “advanced classes”.
So this semester I’m taking programming languages, where we learn about the different classes of programming languages by building interpreters, models of computation, which is sort of about the different classes of computational problems, computational prob and stats, which is mostly simulating randomness with different insightful methods and logic class on incompleteness theorem, which in a very general sense proves what is unprovable by logic.

It won’t be surprising to me that not a lot of people will really understand what my classes are about, because they’re very specialized sub-fields within already technical fields.

There’s a couple of questions that arise with the realization that what you’re studying is so particular and narrow that someone may never ever come across it in their entire lives. You can go into science libraries and just see books and books on topics that you have no idea what they are about. You can find lists and lists of prolific scientists/academics who have published a lot but who you have no idea who they are.

The first question is then what do you choose to study? You know how to specialize into a field?. How and what do you choose are some difficult personal question.

Second question is is it worth it? Studying technical things tend to be very painstaking and difficult processes because there is a right answer. Understanding and getting to the right answer can be painfully frustrating and slow. But it’s also increasingly true that fewer and fewer people understand you. I mean it’s hard enough to find people who you can connect with. It’s going to be even harder to find people who connect with you but also understand the same things. Your view of the world is shaped directly by your knowledge. And being more familiar with certain kinds of ideas, in my case, computing, biases my views a lot.

But the problem of finding people who you connect with still remains? I used to believe in true love but it seems increasingly apparent to me that being a perfect match for each other doesn’t really exist. Compatibility certainly does but our world is so diversified that I don’t think two can be perfect of each other. It’s sort of the other things that hold relationships together, care for each other will be the motivation for you to learn about the other person’s world.

And with every moment, I fall deeper into the alone hole, get more involved with academics and lose that critical balance that is so vital to keeping you sane and connected to the world. Erickson’s stage of development for our age is that of deciding the meaningful relationships to form in your life. And finding a significant other is important for you to be connected to the world. We are just motes of consciousness trying to find accompaniment in this lonely universe.

Sep 23,2012

computer science thoughts before the fall

This post is includes information about these three parts:

  1. The hierarchy of computer science problems
  2. Parts of computer science which I will/might be taking courses in
  3. Parts of computer science which I might want to work in in the future

Disclaimer: This post is just a formal way of trying to connect all these pieces together using make up stories from my head in a very crude and atrociously inaccurate manner.

Hierarchy of computer science problems

The hierarchy is as follows

  1. Computability Theory
  2. Computational Complexity Theory
  3. Design and Analysis of Algorithms
  4. Improving implementation

The first thing to understand before I delve in is what computer science actually does. A definition I found particularly insightful from Philip J Guo, the author of The PhD Grind, who says Computer Science is Efficiently Implementing Automated Abstractions

The thing about computer science that people need to understand is that computer science is revolutionary is that we learned how to control voltages in circuits and built layers and layers of abstraction upon that so that we could have machines/computers that would carry out the instructions perfectly without any chance of error (with some exceptions). Building machines and writing instructions for them to compute has been far more efficient and productive than teaching humans and over organisms to compute because they don’t need food, motivation but only electricity. (Although there has been some headway into cellular computation)

But this automation has a limit. Some problems are undecidable/non-computable. I must emphasis that there is a difference between questions that are undecidable, unsolved and simply have no answer to. This is where Computability Theory steps in, which Turing formalized and basically showed the non-decidability (Turing computability) of certain problems when computers barely even existed around WWII. That was the beginning of theoretical computer science I suppose you could say.

I think I will be taking PHIL1680A which is in Topics in Logic: Incompleteness. It’s sort of related to this topic but more on metalogic, or basically how logic is limited by itself as well. This class is more of an experiment for me as I don’t really have any formal training in logic but the topic seems very interesting and relevant to computer science. But I hope to get some formal logic training out of it and some mind blowing insights into logic itself.

Then computers/mainframes started becoming popular, and we started to try to  harder and more difficult problems.To each of these problems, there is an algorithm, a step by step procedure for calculations. Algorithms is not a construct unique to Computer Science. (many may disagree but) in fact, most of the math we’ve learnt are algorithms. How to find the sum between two large numbers. You start from the right most digit, find the sum of those two digits, carry 1 over if needed, and proceed to the digits to the left and add the carry. That I think is also an algorithm. So people start developing algorithms, and realize that there are different speeds (runtimes) for each in the best, worst and average cases. So this is Analysis of Algorithms. There are more than a dozen ways to sort an unordered list of numbers/strings into an ordered one, each having a different best/average/worst case big-O runtime as it is called in computer science.

I am hopefully going to take cs157 in the spring, which is our algorithms class, learning all the different algorithm classes, mostly likely the deterministic polynomial ones. This class will give me familiarity with all the different classes of algorithms I will ever come across as a commercial software developer if I decide to work as one. It will also be a class heavy in proof writing and help contribute to more mathematical maturity.

But then, during this process of writing and proving the runtime of different algorithms, larger patterns start emerging. Computer scientists started realizing that runtime of some problems took larger than the size of the input list i.e. very famous computer science problem that some of you may have heard of the “The Traveling Salesman problem” , which is given a list of cities and their pairwise distances, find the shortest path that travels to each city once and return to the origin city. Different complexity classes started emerging, with different capital letters and subscripts denoting the differences i.e. P, NP (non-deterministic in polynomial time), EXP (standing for exponential). The sort of big question of this century is whether P = NP? It’s part of the Clay Millennium Mathematics Problems and has huge implications ranging from cryptography, efficiency of our economic markets and protein structure prediction once proved. I am going to paraphrase something my research professor paraphrased heardfrom a talk he attended. “Computer science are in hell, dealing with the complex intricacies and faultiness of computer systems, while mathematicians are in heaven, hidden away by beautiful abstractions. But once in a while, a question from hell, from computer science, P = NP, raises above to heaven that baffles even mathematicians”

I am going to take CS51 this upcoming fall, which is the Theory of Computation class in our department.From what I understand, there will discussion about the aforementioned different complexity classes, and a lot of work in using reduction to go from one formulation of the problem into another one that can be identified as a clear member of one of the complexity classes. There will also be work on automata theory.

Finally, computer science boils down to the implementation part of these algorithms, programming, which is what I think many people mistakenly believe is what computer science is all about. But it must be said that algorithsm are only useful if someone implements them and applies them to some problem. And even the implementation itself varies greatly depending on different languages, understanding of computer systems. The bulk of work in systems, software engineering is on the differences between different implementations and the related performance increases.

I’m TA-ing cs33 in the fall, which is basically this, an intro into computer systems, and low-level programming. I admit my training in this area has been weak and lacking for the most part so this will be a good chance for me to work on.

As to which part of computer science or which part of the hierarchy I want to work in post graduation is a very good question I’m trying to answer as well. Well the first two parts 1)Computability and 2) Computational Complexity pretty much refine me to academia but that doesn’t appeal to me as much. Somehow, I’m not too enarmored with the idea of commercial software development. Software development interviews are very intimidating and being a code monkey is not really what I want to do. This sort of leaves 3) Design and Analysis of Algorithms which can be done in a non-academia setting in certain computational science fields, but again that will and must include some software development. I am also including subfields which require more mathematics i.e. machine learning into this classification. I’m not too sure where I stand on the spectrum yet but hopefully these few courses this year will help me answer those questions. Of course, this completely leaves out things like product manager, business-side roles that involves less computer science.

Jobs is a question that is slowly looming in my head, like a dark overcloud and soon the question and application forms will consume me.

coder

just finished my final presentation for cs32, the notorious software engineering course in the institution which i study.

i’ve always been aware of the fact that i’m not actually a very good programmer. and that statement is truer than ever after this semester of 32. and i wanted to look at this statement from a variety of angles and tear it apart.

cause 1: i’m not an pre-planning coder  (at all).

i tend to see a problem, think for a while, decide i have a good enough solution and then try to write it up in code.
i get a bug, then change my code to fix it, then iterate this process until it “works”.
this is bad on multiple levels.
i have a good idea of what the structure is in my head, but never a complete picture.

so it’s very easy to say i’ll just write this method or call this variable with a dumb name, or copy and paste this code here and there, and then you base your future on this sort of messy hack of code and so forth and the whole piece of code just spirals out of control.
this gets exponentially worse when you’re writing medium sized projects, especially without an ide which lets you refactor automatically.

so i’m a scrappy programmer, i write things really quickly but they usually don’t always work, and things spiral out of control and become exponentially more difficult to maintain and debug. and i add more if var != null statements, and include more checks, and eventually it works. but by now, it looks like a giant mess which someone popped on.

i can’t really tell if that in itself makes me a shitty programmer, or  just a trait which other people have had success with. this is contrasted against other coders (who i need to give credit to )who meticulously spend lots of time laying the foundations of the project, and thinking and rethinking for a better solution. they don’t start until they’re absolutely sure that there’s nothing better.

i don’t do all of the aforementioned crap without some faulty reasoning.

i have a very short attention span and like to jump between algorithms, and problems in my head. and i find it much more rewarding to see connections between different algorithms and problems than try to solve the same problem three times in progressively cleaner, and more elegant ways. if i have a solution, i usually keep it rather than spending more time to come up with a better one.

i also have a hard time believing that when you’re working on a complex program that you can ever have the whole program in your head. i’m at the peak of biological cognitive abilities. people are sharpest, smartest when they are in their early 20’s. so it’s ok now for me to say you got to practice going through this insane loop with 10 changing variables, and 4 break statements. but i know in 20 years them, there will be no bloody way for me to be executing a task of this cognitive complexity.

i also generally don’t believe that you can predict all the complicated interactions between different parts of the program. so you inevitably have to make modifications so why spend so much time to have this perfect little castle, if it’s going to have to be destructed anyway?

any of you who have made it this far down the post are probably interested in this question. so help me out a little bit and tell me what i should do? is it just always better if i take a painstakingly long time to lay out the foundations and formulate all the problems in my head before i write anything down? do i gain anything from re-attacking the same problem multiple times for a more elegant solution. would that be considered pre-mature design? but those  kinds of design questions are especially important when you start looking at things like threading, concurrency, scalability. what is the distinction between design and programming. is there a divide at all?

is programming inherently an act where you have to keep modifying and adding to the same piece of code again and again as you make it more complex? is there any gain for me in trying to get a perfect solution in the beginning? it seems to me that it’s somewhat related to the concepts of writing a very verbose novel or spending a lot of time debilitating on the word choice for a poem

data implies monooply

here’s the two trends in business:
data
monopolization

there used to be diseconomies of scale
but technology converting those diseconomies of scale into economies of scale

advanced communication systems
is that going to benefit small companies with two regional offices more
or large MNC’s more

data is giving powerful knowledge to people who wouldn’t otherwise have this specialized knowledge
but is that going to benefit a small town book store owner who knows the predominant demographics and relevant statistics in the area or
a Barnes and Nobles manager who can now find that out by doing some analytics

we’re removing the need for workers with specialized knowledge.
we’re removing the regional specialist.
we’re removing the people who do the dirty work

it’s going to be a monopoly.
it’s funny when we think we control our lives. in fact, our lives are controlled by a few.

cs and finding jobs

via

this came up on the top visualizations of 2011 from visual.ly.

it’s really exciting to talk about all these fields and discuss how they are changing our lives with friends, but to be honest, i am still inside the center, desperately trying to learn all the pre-requisite knowledge before i can talk of being up to date.

i had a really awesome time learning about computer systems this semester under pvh. it gave me a much deeper understanding of computers, something which i’ve always wanted to learn. but a recent talk with mbwong about has led me to look into this exploding field centered around data: machine learning, comp bio, artificial intelligence. i found some information and have subsequently decided to make my own brainstorm. participation is encouraged.

http://www.mindmeister.com/maps/public_map_shell/130808206/computer-science?width=600&height=400&z=auto
and then i realized that i had always loved big data all along.

as a young kid, my favorite books were the huge ones with cross sections of buildings and structures with gorgeous illustrations explaining every part. i no longer have it on my shelf but i loved this star wars book so so much. (why don’t they make good ones anymore?) it was the perfect book, combining nerdiness and big data.

i’ve been listening to some lectures on ml-class.org, one of three classes that were offered by stanford to the public. i currently plan to take machine learning next semester but i’ll decide after the first few weeks.

the real problem is i still haven’t really decided what kind of field i want to specialize in. i’ve been watching the first lecture of  many courses on opencourseware to get a feel for each subfield inside cs but i’m still unsure.

the other thing that’s been on my mind is jobs/internships.

after getting into college, my naivety led me to believe that studying cs at a good university would  automatically give me tons of internship offers. so i haven’t really been on my game in terms of preparing myself and my resume for jobs. most of the things i’ve done have been short lived and random so it looks terrible on a cv.

but i’ve recently started to remember how many better-qualified candidates than me. the comforting thing to think is that they always say they will hire great talent as soon as they see it but the problem is becoming that great talent. with the limited knowledge i have under my belt, i don’t think i stand out as an applicant compared to many of my peers.

my uncertainty of my interest is also a concern since i don’t really know what job to apply for if i don’t know what i want to do.

i can’t decide whether for the rest of my winter break i should devote more of my time into watching lectures and learning things or looking into ways to beef up my resume? advice?

my deal with social media

can you imagine a day without facebook anymore? without tags, likes, comments, hashtags? what did we do before that?

for those of us who grew up with the internet boom, who wrote e-mails before we wrote real letters, we understand the word media very differently from those who precede us, even our parents.

to list a simple example, it took me and my brother a long time to explain to my mom why people uploaded Naturo episodes onto youtube for free. our reason was social recognition.

it took PR firms and the public years and months before some corporations are now finally putting down the corporate PR face and allowing employee blogs, personal releases.

it took a long time too for me to convince myself to publicize my blog, have a linkedin profile, put stuff onto social network sites. because if you think about it, it’s absolutely amazing how much information we now have of each other after just a few years of voluntarily putting up information on sites like facebook, twitter and blogs.

the stuff i have publicized is probably more than many authors whose understanding and craft of the English language exceed mine.

slowly, we’ve seen over the generations, the lowering of the barriers to entry of expression. it used to be that you had to be a reputable and rich man to start a newspaper. now, the barrier of entry of publishing your first blog post is now minutes, as wordpress famously advertises. it takes a few clicks of the thumb to publish your less-than-140-characters tweet to your followers.

the world is changing, one of the changing forces is social media. more and more people are doing things not the Taylor way, but for social recognition or simply because they like doing it.

everything is a social experiment, including this blog series, this one-way street into the increasing heap of digital noise that is changing the way we understand media. we don’t know where it’s headed but no one wants to be the last to arrive at the scene anymore.

mysequel to pilot

The internet is a blackhole of my time.Since I acquired the addiction of compulsively using the Ctrl + Click combination on links at a young age, it doesn’t take me long to have a browser have lots of tabs, like the one I have them, and subsequently the time to digest through all these links I’ve opened.
Another friend of mine, Greg J-D, wins the honorary title of being the compulsive one of all, I’ve seen 50 tabs open once across a browser on his computer.

The start of this series represents an internal desire to be more picky with my internet digestion choices, so that I will learn to chew and process the stuff I read on the internet and hopefully be able to single-out an area which I’m truly interested in. It’s easy to open lots of links, but it’s hard to decide which ones are worth your time to read. I also want to use this series to share my knowledge of technology with everyone who reads this. Rather you want it to or not, it’s taking over our lives more and more. Facebook is now the first site I open when I open Chrome. So please leave a comment on what you want to hear and I’ll do my best to share.

Today’s post will be about MySQL and NoSQL databases, implications for facebook

SQL (pronounced sequel or S-Q-L) is short for simple query language.
It’s more largely seen in the form as MySQL, which is a type of database. A database is somewhere where you store something.

So you may be scratching your head, what is a MySQL database? MySQL databases are a form of RDMS (relational database management system). Think of RDMS or MySQL databases as excel files.
In an excel file, you have headers on the first row as such http://amiworks.co.in/talk/wp-content/wbw-xls2.jpg.
Then, for each entry, you input new values on a row.

That’s the gist of RDMS.
Facebook was built from the start on a combination of MySQL +memcached. (Memcached is a database that specializes in caching, something we can talk about next time)
If you think about it, it makes a lot of sense.
When we first got Facebook, there weren’t a lot of features. Every user had to input their name, sex, interested in, relationship status, profile pic, favorite books. Stuff like that
So it totally is possible that you can input all of that in an excel file, right?
Even you could make your own facebook of the names, birthdays, sex of all your friends in excel.

But if you’re old like me, and used msn messenger, and remember that there was a limit on the number of friends you could have (or the e-mail chains that went around saying Microsoft was going to close down MSN messenger.)
Or slightly more recent, when you could only put 60??? photos on a facebook album, then you can sort of realize the limitations of MySQL.
The problem of MySQL is that as soon as you have no header, then you have no place for new information.
The fundamental problem is that a column is discrete. Even if you can add unlimited number of columns, you still have to add a column every time you have a new piece of information, and you have to update all the previous entries that were created without this column.

Structure gives us speed, gives us organization.
But if you want to develop more features, develop more unrestricted services i.e. facebook messages between different people, accommodate for that power user who wants 61 photos in his album rather than 60, then you have to move to a different paradigm, which is where NoSQL comes in.

NoSQL is not a database, it is group of databases that are grouped by the fact that they are not Relational Database Management Systems, not excel files, but things like KV store (key-value).

Even facebook’s new message system was implemented using their own Hbase, a NoSQL storage system.
NoSQL represents a shift in the database system world, represents a more flexible storage system, but not sacrificing performance. The cycle of technology is getting faster and faster. Even such a young company like facebook, compared to corporate giants like General Electronic, AT&T, are now considered behind and slow on the technological side.
MongoDB, Redis, Membase, Memcached, are just a few of the names that will be reshaping the technological world in the next few years. Not to mention, they’re opensource and free.
What has the world become?

Please leave a comment below and I’m looking for suggestions for the topic of the next blog post.