Tuesday, December 30, 2008

Perform Case Sensitive Search with Google

google is case insensitive Google says that they treat search queries as case-insensitive and all letters in a search phrase are always interpreted in lower case. For example, searches for barack obama, Barack Obama and BARACK OBAMA will all return the same results on Google.

There are however instances when the case of a search query is as important as the search phrase itself because the word meaning can change with the case. Some examples of Capitonym words include March (month) & march (walk), Polish (language of Poland) & polish (to shine), Bill (person’s name) & bill (invoice), etc.

For instance, "Ram" is the name of a Hindu God while "RAM" is an abbreviation for Random Access Memory. They both share the same spelling and it’s the case that helps you understand the real context of the word. Unfortunately, Google searches are not case sensitive (or fold case) and hence most search results for Ram or RAM are about the "temporary" memory.

To solve this problem and help you conduct case sensitive searches on Google, someone has created a Google Appengine powered search engine at Case Sensitive Search - it scans through Google search results and filters out results that match the case of your search query.

Coming back to original example, here’s is a comparison of case sensitive Google search results for "Ram" vs. "RAM".

Google Search - Case Sensitive

Perform Case Sensitive Search with Google - Digital Inspiration

Saturday, December 27, 2008

Million Ways to Kill your Project

Most often I find people introducing all forms of accidental complexity and screwing up their projects. Over the years I’ve learnt some powerful ways to kill a project/organization.

Mediocracy over Innovation and Excellence

Indifference (I don’t care) over Passion and Pride

Sloppiness over Craftsmanship and Self-Discipline

are some of the most common values. And there are many ways to encourage them:

  • throwing more people at a problem

  • no visible value system

  • treating your employees as dispensable resources

  • punishing failures and ignoring achievements

  • create more and more specialized roles on a project. (Architects, Designer, Java Developers, Database Developers, UI Developers, DBAs, Manual Testers, Automation Testers, Regression Testers, Performance Testers, Graphics Designers, Web Designers, User Experience Expert, Domain Expert, Business Analyst, Subject Matter expert, System Analyst, Technical Writers, Project Managers, Program Managers, Module Leads, Tech Leads, Configuration Manager, Build Monkey, Product Owner, Scrum Master, Consultants etc)

  • build all the possible frameworks which might ever be needed before building an application

  • try to build a very generic solution which is infinitely scale and extensible. (does not matter if you are building a hospital management system, it needs to be generic enough that tomorrow if the business decides to get into hotel management they can use the same).

  • use the greatest and latest technology buzz words, frameworks and concepts

  • death by process and meetings

  • failures and slippages results in more process addition and stronger & strict process adherence and evaluation

  • And the list goes on…

Tuesday, December 23, 2008

Calvin and Hobbes for December 22, 2008

Monday, December 15, 2008

Comic for December 15, 2008

Saturday, December 13, 2008

Strategy: Facebook Tweaks to Handle 6 Time as Many Memcached Requests

Our latest strategy is taken from a great post by Paul Saab of Facebook, detailing how with changes Facebook has made to memcached they have:

...been able to scale memcached to handle 200,000 UDP requests per second with an average latency of 173 microseconds. The total throughput achieved is 300,000 UDP requests/s, but the latency at that request rate is too high to be useful in our system. This is an amazing increase from 50,000 UDP requests/s using the stock version of Linux and memcached.

To scale Facebook has hundreds of thousands of TCP connections open to their memcached processes. First, this is still amazing. It's not so long ago you could have never done this. Optimizing connection use was always a priority because the OS simply couldn't handle large numbers of connections or large numbers of threads or large numbers of CPUs. To get to this point is a big accomplishment. Still, at that scale there are problems that are often solved.

Some of the problem Facebook faced and fixed:

  • Per connection consumption of resources. What works well at low number of inputs can totally kill a system as inputs grow. Memcached uses a per-connection buffer which adds up to a lot of memory that could be used to store data. Nothing wrong with this design choice, but Facebook made changes to use a per-thread shared connection buffer and reclaimed gigabytes of RAM on each server.

  • Kernel lock contention. Facebook discovered under load there was lock contention when transmitting through a single UDP socket from multiple threads. Sockets are data structures too and they are subject to the usual lock contention issues. Facebook got around this issue by maintaining separate reply sockets in different threads so they would not contend with the receive sockets. They found another bottleneck in Linux’s “netdevice” layer that sits in-between IP and device drivers. They changed the dequeue algorithm to batch dequeues so more work was done when they had the CPU.

  • Application lock contention. Nothing brings out lock issues like moving to more cores. Facebook found when they moved to 8 core machines a global lock protecting stats collection used 20-30% of CPU usage. In application that require little processing per request, as does memcached, this is not unexpected, but doing real work with your CPU is a better idea. So they collected stats on a per thread basis and then calculated a global view on demand.

  • Interrupt floods and starvation. With so much traffic directed at a single server the hardware can flood the CPU(s) with interrupts and keep the CPU from doing "real" work. To get around this problem Facebook implements some complicated strategies to load balance IO across all the cores. As I am less clever I might try more network cards with a TCP Offload engine.

  • When you read Paul's article keep in mind all the incredible number of man hours that went into profiling the system, not just their application, but the entire software hardware stack. Then add in the research, planning, and trying different solutions to see if anything changed for the better. It's a lot of work. Notice using a nifty new parallel language or moving to a cloud wouldn't have made a bit difference. It's complete mastery of their system that made the difference.

    A summary of potential strategies:

  • Profile everything. Problems are always specific. The understanding of the problem must be specific. The fix must be specific.

  • Burn profiling into your regression tests. Detect when and where performance tanks as a regular part of your build.

  • Use resources in proportion to what grows slowest. This requires multiplexing, but at least your resource usage is more predictable and bounded.

  • Batch work. When you have the CPU do all the work you possibly can in the quantum or the whole system grinds to a halt in processing overhead.

  • Do work and maintain resources per task. Otherwise locking for shared resources takes more and more time when there's less and less time to do the work that needs to be done.

  • Change algorithms. Sometimes you simply need to do things differently. Tweaking will only get you so far.

  • You can find their changes on github, the hub that says "git."

    Comic for December 13, 2008

    Monday, December 8, 2008

    Java 7 - small language changes

    Todays news that there are likely to be language changes in JDK 7 was a bit of a surprise.
    It seemed like the chance had passed.

    Small language changed for JDK 7

    Joe's post isn't very long, but it is clear.

    • "I'll be leading up Sun's efforts" - so its supported by Sun

    • "develop a set of ... language changes - so its more than one change

    • "small language changes" - small means no closures or properties

    • "we'll first be running a call for proposals" - community involvment

    • "in JDK 7" - in the next version (but why does the blog not say Java 7?)

    I've used this blog, and previous visits to JavaPolis (now Devoxx) to discuss possible language changes.
    Some have been good ideas, others not so good.
    The main point was to provide a place for discussion.

    The next phase after that was to implement some of the language changes.
    This has been achieved with the Kijaro project.
    As far as I'm concerned, anyone can write up an idea for a Java language change, and then I'll provide
    commit access at Kijaro for it to be implemented in javac - no questions asked.


    So, before we all submit lots of crazy ideas and get carried away, lets remember that Sun provided some hints at JavaOne 2008.
    This presentation
    includes the following as ruled out:

    • Operator overloading (user defined)

    • Dynamic typing

    • Macros

    • Multiple dispatch / multi-methods

    And the following as 'under consideration':

    • Multi-catch in exceptions

    • Rethrowing exceptions

    The following are listed as 'long term areas of interest':

    • Parallel algorithms

    • Versioning of interfaces

    • Delegation

    • Extension methods

    • Structural typing

    • Pluggable literal syntaxes

    So, there is already quite a wide list on the table.
    Plus, there were other ideas suggested at last years JavaPolis, by both Josh and Neal:

    • Variable declaration type inference (for generics)

    • Enum comparisons

    • Switch statement for strings

    • Chained invocations, even when method returns void

    In addition to all the above, I strongly suspect that there isn't going to be a chance to tackle
    problems with generics. This is similar to closures and properties. There isn't the timesclae or manpower
    to tackle these big issues in the timeframe being talked about (especially now Neal Gafter works for Microsoft).

    My ideas

    Well, I'll mostly save those for another day.
    But I would like to see a proper consideration of enhancements to help with null handling.
    And enhancments to the for-each loop.

    Saying NO!

    Finally, an appeal to Sun.
    Many in the community are deeply sceptical of any language changes at this point.
    The message should be simple - people need to feel that there is a clear means to vote or argue AGAINST
    a proposal, just as much as to make suggestions FOR change.
    Although I don't expect to make much use of the facility, I know there are many that do want to express this opinion.


    This is a new departure for Sun in so openly asking for ideas from the community.
    We need to respond and reply with thoughtful ideas for what will and will not work.