Raible Designs

Keeping development and production databases in synch

We're getting ready to put the app I developed into production and we're facing an interesting issue. This app is designed to be a simple interface for mapping an "attribute value" from one system to another system. Currently, we do this with SQL in our dev database, and then copy these tables' data to production when we deploy.

However, with this application, we now have the ability to push out the "mapping responsibility" to our users. The problem is that if our users enter the data in production, we have to figure out a way to synch between dev and prod. We constantly need to enter our own mappings in while we're developing - so there's always going to be data entered in both dev and prod databases.

Is it possible to share the same sequence tables b/w Oracle instances and use triggers to do a second insert into dev when someone does something in prod (or vise versa)? I suppose a manual solution is possible - where we make sure and enter values into prod when we enter them into dev, but that's just ugly.

Update: TheServerSide has additional coverage of this topic.

Posted in Java at May 03 2005, 02:38:53 PM MDT 14 Comments

Comments:

You have a couple of options in Oracle: If you want, you can simply specify the schema name as a prefix for your tables with the proper permissions and everyone can access the data. The best option seems to be for you to create the tables in a "settings" schema then create views from each DB into the settings schema to get at the data. You wouldn't have to specify the schema name and if you want you can simply change the view to point someplace else for your own sandbox type of development.

Posted by Ryan on May 03, 2005 at 09:10 PM MDT #

You could use GUIDs instead of sequences as your surrogate key: http://www.jroller.com/page/jcarreira/20041215#overcoming_the_hashcode_object_identity On a related note, I was wondering just a few minutes ago why Appfuse uses "business" keys (User.username, Role.name, etc.) rather than (generated) surrogate keys. The Hibernate guys strongly recommend always using surrogate keys. Opinions? Any thoughts on how hard this would be to change in Appfuse?

Posted by David Carter on May 03, 2005 at 09:26 PM MDT #

David - I doubt our DBA is going to allow us to switch from sequence tables to GUIDs, but it's a good idea nevertheless. As far as the "business" keys, I'm using user.username and role.name as primary keys b/c I've found it easier let the database handle duplicates - since we don't allow duplicate usernames or role names anyway.

We used to have a user.id for the primary key, but then it was impossible to use Hibernate's one-to-many to link user to role in the user_role table. This was when we were using CMA and had little control of the SQL query.

I'm sure it could be refactored now that we're using Acegi Security. But if it ain't broken - why fix it? ;-) Using the username as the primary key makes it easier to integrate with LDAP.

Posted by Matt Raible on May 03, 2005 at 09:37 PM MDT #

With Hibernate, the database can still handle duplicates, just specify unique="true" on the @hibernate.property xdoclet tag.

Hibernate should also transparently handle the one-to-many (or isn't it many-to-many?) between users & roles, even using surrogate keys.

I understand the CMA integration issues, but as you say, that should no longer be an issue when using Acegi. Maybe this question could be revisited once people get comfortable with the CMA/Acegi change? I look at Appfuse as a repository for best practices, and it seems that surrogate keys are the recommendation I keep seeing put forth as a best practice. Admittedly, it's not a very big deal with simple classes like user & role, but it can get to be a much bigger deal with more complex classes, where composite business keys would need to be used in the absence of a surrogate key.

Posted by David Carter on May 03, 2005 at 10:54 PM MDT #

You could use database trigger and database link to be a poor man's replication. Just remember to catch the trigger error if the dev database is down or influx for development, so not to mess up production Good luck! I enjoy your post.

Posted by J-High on May 04, 2005 at 01:57 AM MDT #

Can you use stored proc? If yes, into it you can do both inserts.

Posted by Gilberto on May 04, 2005 at 11:38 AM MDT #

Should you do this? Seems like an unnecessary impact on production... Why not write scripts to pull production down to dev when you need to test anything the user enters. I can't understand why you'd really need a real-time refresh of development data? If you're database is small enough that it can handle the overhead of writing dual copies of inserts, then it should be easy to replicate on a daily basis.

Posted by greg akins on May 04, 2005 at 11:58 AM MDT #

Hi. What is amount of development data? I was always annoyed if there was need to enter some data to database at development stage. If you recreated DB structure, you should re-enter data. Probably it's better to keep development-level mapping in some format (XML for example) and load them to the database at deploy-time. it's just idea of course. if you really need real-time-sync, other approaches should be used.

Posted by v.kvassov on May 04, 2005 at 01:25 PM MDT #

Why do you use sequence tables instead of sequences ? That's really suboptimal, not to mention unnatural given that you have sequences in the first place.

Posted by Radu-A. Popescu on May 05, 2005 at 07:24 AM MDT #

I do not know the exact details of your datamapping/transformation challenge, but over the years I've developed the following approach.

1. specify the the static part of the data.
2. create a holding area for the static data both in the database and on file. (therefore you can put it under source control, cvs/sourcesafe/subversion what ever tickles your fancy) 3. specify/establish the rules for transformation.
4. load the static data from file in your holding table.
5. transform your static data into final data.

Posted by m.dankoor on May 05, 2005 at 08:39 AM MDT #

You might want to take a look at DBUnit (http://dbunit.sourceforge.net/apidocs/org/dbunit/dataset/CompositeDataSet.html) for retrieving data from production and development and then merge them into a single dataset. A simple batch program can automate this process and commit a merged dataset to your projet's source repository at regular intervals so that your developers get regular updates which can then be loaded in their own development schema (if they have one) in addition to a shared development DB from which data is merged with production. Cheers, Thomas

Posted by Thomas Van de Velde on May 05, 2005 at 01:46 PM MDT #

Radu-A: I meant sequences, not sequence tables. ;-)

Posted by Matt Raible on May 05, 2005 at 01:50 PM MDT #

We had a similar issue many years ago at a startup; we had personalization and rules information that could be entered by business users in production or the prerelease environment, and both needed to be kept in sync. The original solution was to use a set of triggers that copied the changes into a q table, but this was somewhat unreliable and ugly. The real key is to create a third repository that is the master, and make production and the test environments slaves to the master. Whenever a change needs to be made, it is entered into the master (usually located in the production environment), and it flows out via the async method of your choice to the all of the slave databases. Oracle has some built in facilities for replicating tables, although I think our version simply opened a db connection to the remote system and ran the update.

The upshot of that whole experience is that I never try and replicate between databases - I've always since managed to find a simpler solution :)

Of course, if you just have a few occasional changes, take a look at http://uptodater.sf.net.

Posted by Andy Pruitt on May 06, 2005 at 05:59 AM MDT #

You can enforce uniqueness with a unique index you don't have to have a primary key. The database will still do all the work for you, albeit a slight different error message.

One reason to not use user names as a primary key, for example, is they have a nasty habit of changing (marrige/divorce come to mind). Cascading a primary key change to child tables can sometimes be tricky depending on how much referential integrity you've built in. Regardless some code will need to be introduced to ensure the key change is cascaded properly so you don't end up with orphans. Clearly this is prone to errors as the complexity increases.

Posted by David on May 12, 2005 at 05:48 AM MDT #

10+ YEARS

TWITTER

RECENT ENTRIES

Tag Cloud

Keeping development and production databases in synch