Thursday, 4 October 2012

Re: [dcphp-dev] Master Retailer Database -- MediaWiki? OSM?

Actually, the mediawiki API isn't too bad. Lots of bots have been
written, and at the moment I'm looking at each changeset to be managed
by a bot. I'm not 100% sure it'll work, but it's been fun to play
around with the idea. And the mediawiki infoboxes provide
standardization for different record types.

It also makes media management (mostly photos, but perhaps someday
video) easier in some ways.

It's amazing, though, if you dive into the mediawiki source code --
it's not that elegant! Lots of globals, the parser is made up of a
lot of regex's, etc. But it powers one of the most successful sites
on the web.

The biggest downside to using mediawiki is that at the moment, the
data I'm collecting has very little prose. So it might not be the
right answer, but like I said, it's been fun to poke around and see
how it might be used in this or some other project.

Thanks for the feedback.

Tac

On Thu, Oct 4, 2012 at 5:53 PM, D Keith Casey Jr
<keith@caseysoftware.com> wrote:
>
> I'm not sure if I understand your goals..
>
> Are you just trying to standardize the data to make it
> searchable/combine-able?
>
> Or are you trying to track the changes and differences over time?
>
> Either way, how is the data getting updated? If it's really a person editing
> record by record, then a UI like Mediawiki might be useful. But at the end
> of the day, it's still one at a time and there's no API.
>
> I think the OSM angle could be useful, especially if you can connect in to
> resolve the canonical address for your (probably mangled) address. Assuming
> you can match *good enough* it would simplify a lot of things and lend
> itself to pushing back updates pretty easily. I think you could do the same
> read-based actions using Google's location searches.
>
> If you're trying to track changes/diffs over time.. that's a nightmare. I'm
> diving into some of that with web2project (think: project baselining &
> drifting) and it makes my head hurt. :(
>
>
> Regardless, I think unless you have a compelling reason on why *not* to use
> a standard database, you should go for it. The API is a separate
> consideration from storage anyway. If you can convince the BreweryDB guys to
> share their mindset, there's probably a lot of overlap.. in concept, not
> content.
>
> kc
>
>
> On 09/27/2012 05:01 PM, Tac Tacelosky wrote:
>>
>> I have data from a bunch of different government agencies regulating
>> retail outlets. Most (but not all) of them have some sort of internal
>> identifier, but it's maddening to try to get reports with data where
>> the name or address is slightly different, and of course there's no
>> master id.
>>
>> So I'm trying to put together a master database, with our own
>> identifier. I figure we'll go through some address standardization
>> for the first pass. It gets more complicated, though, when store
>> names change (e.g. when a business is sold, the address stays the same
>> but the name doesn't) or moves, or is added, whatever. We're talking
>> about making opening the data up under the Open Database License, and
>> I'd like to use some sort of standard tool.
>>
>> MediaWiki came to mind first. It'd be a great UI, and would allow
>> people to update the site with specific information like if the store
>> was no longer in business. But the majority of the data updates would
>> come from merging lists from various agencies. On the huge plus side,
>> we have an API for access and updating, and a built-in history tool.
>>
>> OpenStreetMap was my second idea. Many of the same benefits, but a
>> bit more awkward to work with. The huge advantage is that the data
>> relevant to OSM we get from the agencies could be pushed back to the
>> OSM database. But I'm not sure how to handle our project-specific
>> data (e.g. violations) which is clearly of no interest to OSM.
>>
>> I'm leaning toward MediaWiki, and was wondering if anyone had any
>> experience with managing what is more often stored in a relational
>> database. I'm trying to avoid writing a full site for this, but it's
>> tempting to start with name, address, phone, etc., but then I'd have
>> to manage all the history and wiki nature of this. So I'm looking for
>> a combination of the free-flow wiki-style data and a structured
>> database.
>>
>> I worked on something a while ago where we just enforced headers,
>> which roughly mapped to each field. But that felt a bit hackish.
>> It's more like I want a form within a mediawiki page.
>>
>> We will write some plugins for linking to data that comes directly
>> from a database, like the violations themselves, but the common data
>> is what I'm thinking about now.
>>
>> Any pointers?
>>
>> Thanks!
>>
>> Tac
>>
>
>
> --
> D. Keith Casey, Jr.
> http://CaseySoftware.com

--
You received this message because you are subscribed to the Google
Group: "Washington, DC PHP Developers Group" - http://www.dcphp.net
To post, send email to washington-dcphp-group@googlegroups.com
To unsubscribe, send email to washington-dcphp-group+unsubscribe@googlegroups.com
For more options, visit this group at http://groups.google.com/group/washington-dcphp-group?hl=en

0 comments:

Post a Comment