KDO - Kyle Dye Online
MongoDB Driver for CodeIgniter

I have been working with CodeIgniter for quite a while and eventually I ran into a project where the customer wanted to use MongoDB.  At first I thought about writing a database driver, but then I realized that would never work because the drivers are based off of SQL logic, but MongoDB is a NoSQL style document database.  I eventually came across Alex Bilbie’s MongoDB plugin but found that it lacked the functionality I required, so I decided to use his wonderful code as a starting point and improve/extend from there.  The resulting code can be found on github: http://github.com/kyledye/MongoDB-CodeIgniter-Driver

Please let me know what you think!

Simple web scraping in CodeIgniter

Now that I have figured out how to commit to github, I am adding libraries as fast as possible.  The library linked below allows quick scraping of DOM elements using XPath commands.  Another feature added was the ability to add subqueries - making scraping something like a table and table rows quick and easy.

http://github.com/kyledye/CodeIgniter-Scraper-Library

Using Multiple PostgreSQL Schemas in CakePHP

What an absolute nightmare.  I use multiple postgrsql schemas in many different areas of my websites and I was deterred from CakePHP because it only support one schema at a time.  I fixed this functionality to work a bit better.  Now in all of my models I have to set the $useTable varaible to ‘schemahere.tablenamehere’, but I got it working.  Just had to hack up the dbo_postgres.php file found in ./lib/cake/model/datasources/dbo.  Starting at line 170, I had to change to the two functions to the following:

function listSources() {

  $schema = $this->config[‘database’]; // change the variable value to db name
  $sql = “SELECT table_schema || ‘.’ || table_name as name FROM INFORMATION_SCHEMA.tables WHERE table_catalog = ‘{$schema}’;”; // modify query to select table_schema.table_name as name

}

That took care of one function and got Cake reading the information_schema correctly, now the fun begins…

function &describe(&$model) {
        $fields = parent::describe($model);
        $table = $this->fullTableName($model, false);
        $this->_sequenceMap[$table] = array();
       
        $schema = $this->value($this->config[‘schema’]); // get the schema early, just in case we need to change it
        $tbl = $this->value($table); //get the table value
       
        // iff the table name contains at least 1 period, explode it, set the first value to the schema name, unset the first value, and implode the reaming values as the table name

        if(preg_match(“/./”, $table)):
            $tbl_arr = explode(“.”, $tbl);
            $schema = trim(str_replace(“’”, “”, $tbl_arr[0]));
            unset($tbl_arr[0]);
            $tbl = trim(str_replace(“’”, “”, implode(“.”, $tbl_arr)));
        endif;

        if ($fields === null) {
            $cols = $this->fetchAll(
                “SELECT DISTINCT column_name AS name, data_type AS type, is_nullable AS null,
                    column_default AS default, ordinal_position AS position, character_maximum_length AS char_length,
                    character_octet_length AS oct_length FROM information_schema.columns
                WHERE table_name = ‘$tbl’ AND table_schema = ‘$schema’  ORDER BY position”,
                false
            ); // use the new query and get the values!

….

}

It doesn’t seem like it would be that huge of a deal to implement for CakePHP but apparently people don’t like using PostgreSQL like it was intended…

MongoDB - The New Document Database Hotness

Up until recently, I had worked exclusively with Relational Database Management Systems (RDBMSes) such as MySQL and PostgreSQL in my web development projects.  They store data logically and link different subsets of data together in very obvious ways - and make it fairly easy for new comers to learn.  I had learn the subtle nuances of both and consider myself an expert in many ways when it comes to creating queries in such environments.

Everything in my web development world was in much of a balance - then, from out of nowhere, Travis Hegner hit me in the face with a proverbial pimp slap of epic proportions.  This thunderous blow came from a different type of database system - known as “Document Databases” in the form of HBase, the Hadoop Database.  HBase is meant to be used “when you need random, realtime read/write access to your Big Data.  This project’s goal is the hosting of very large tables - billions of rows X millions of columns - atop clusters of commodity hardware.”

Now you may be thinking to yourself “WTF is he talking about!?”  I will try to clarify by using an example…

A couple of years ago, we worked on creating a web spider that crawls the web for resumes.  The project worked great and we got close to 150,000 resumes into an RDMS (PostgreSQL) in a standardized format using the Sovren Resume Parser which adheres to the HR-XML Resume formatting Standards.  We built a search engine that would allow our recruiters access to the resumes so they could (ideally) find matching candidates for open positions (I work for a staffing company, Trillium Staffing).  Unfortunately, with that sheer amount of data, the queries were taking upwards of a minute to complete - even with a normalized and optimized system.  We dealt with it for a long time, until we came across HBase.  Since then, we have implemented the system and minimized query overhead using the Document Database - which basically searches a giant text-based  dump of each resume in a field.  The searches take at most about 5 seconds now.  Pretty big improvement eh?  I’d say so, too!

OK, so this worked out great but there isn’t a big community yet for this technology.  The reason - in my opinion - is the technology it’s developed on (Java) is a PITA and eats a lot of resources.  So, I looked for alternatives, but to no avail.  

Another downside I found was that the data being processed was mainly static, which is not great in web development because we (love to) work with dynamic data that can be updated instantly.

I kind of but this out of my mind until randomly, on my favorite CSS Gallery, Best Web Gallery, there was a posting for MongoDB - a sort of RDBMS/Document Database hybrid.  SWEET!

After poking around the site for a bit - I was pleasantly surprised at the ease of querying (JavaScript-based queries) and the support.  I was even MORE pleased to find they had a sandbox area of sorts to learn and test queries on.  Then, to make me even MORE IMPRESSED, there was already an Ubuntu Package for it!  Score!  Since finding this out, I have been working on a fairly large project and have really been loving it.  This project happens to be in PHP, but there are also drivers for Ruby on Rails, Django, etc.  I love the fact that I can apply philosophies of both, rapidly search GIANT amounts of data and run automated cron job-type tasks called “Map/Reduce” Jobs right out of the box.  It’s just that simple.

It is of my highest opinion that MongoDB not only bridges the gap between key-value stores (which are are fast and highly scalable) and traditional RDBMSes (which provide rich queries and deep functionality) but it also bridges the gap between modern web development and document-based databases successfully and with elite precision.