I'm not sure if I understand your goals..
Are you just trying to standardize the data to make it searchable/combine-able?
Or are you trying to track the changes and differences over time?
Either way, how is the data getting updated? If it's really a person editing
record by record, then a UI like Mediawiki might be useful. But at the end of
the day, it's still one at a time and there's no API.
I think the OSM angle could be useful, especially if you can connect in to
resolve the canonical address for your (probably mangled) address. Assuming
you can match *good enough* it would simplify a lot of things and lend itself
to pushing back updates pretty easily. I think you could do the same
read-based actions using Google's location searches.
If you're trying to track changes/diffs over time.. that's a nightmare. I'm
diving into some of that with web2project (think: project baselining &
drifting) and it makes my head hurt. :(
Regardless, I think unless you have a compelling reason on why *not* to use a
standard database, you should go for it. The API is a separate consideration
from storage anyway. If you can convince the BreweryDB guys to share their
mindset, there's probably a lot of overlap.. in concept, not content.
kc
On 09/27/2012 05:01 PM, Tac Tacelosky wrote:
> I have data from a bunch of different government agencies regulating
> retail outlets. Most (but not all) of them have some sort of internal
> identifier, but it's maddening to try to get reports with data where
> the name or address is slightly different, and of course there's no
> master id.
>
> So I'm trying to put together a master database, with our own
> identifier. I figure we'll go through some address standardization
> for the first pass. It gets more complicated, though, when store
> names change (e.g. when a business is sold, the address stays the same
> but the name doesn't) or moves, or is added, whatever. We're talking
> about making opening the data up under the Open Database License, and
> I'd like to use some sort of standard tool.
>
> MediaWiki came to mind first. It'd be a great UI, and would allow
> people to update the site with specific information like if the store
> was no longer in business. But the majority of the data updates would
> come from merging lists from various agencies. On the huge plus side,
> we have an API for access and updating, and a built-in history tool.
>
> OpenStreetMap was my second idea. Many of the same benefits, but a
> bit more awkward to work with. The huge advantage is that the data
> relevant to OSM we get from the agencies could be pushed back to the
> OSM database. But I'm not sure how to handle our project-specific
> data (e.g. violations) which is clearly of no interest to OSM.
>
> I'm leaning toward MediaWiki, and was wondering if anyone had any
> experience with managing what is more often stored in a relational
> database. I'm trying to avoid writing a full site for this, but it's
> tempting to start with name, address, phone, etc., but then I'd have
> to manage all the history and wiki nature of this. So I'm looking for
> a combination of the free-flow wiki-style data and a structured
> database.
>
> I worked on something a while ago where we just enforced headers,
> which roughly mapped to each field. But that felt a bit hackish.
> It's more like I want a form within a mediawiki page.
>
> We will write some plugins for linking to data that comes directly
> from a database, like the violations themselves, but the common data
> is what I'm thinking about now.
>
> Any pointers?
>
> Thanks!
>
> Tac
>
--
D. Keith Casey, Jr.
http://CaseySoftware.com
--
You received this message because you are subscribed to the Google
Group: "Washington, DC PHP Developers Group" - http://www.dcphp.net
To post, send email to washington-dcphp-group@googlegroups.com
To unsubscribe, send email to washington-dcphp-group+unsubscribe@googlegroups.com
For more options, visit this group at http://groups.google.com/group/washington-dcphp-group?hl=en
0 comments:
Post a Comment