Get ya Lipsync! Get ya red hot Lipsync! (UPDATED)

Moho allows users to write new tools and plugins. Discuss scripting ideas and problems here.

Moderators: Víctor Paredes, Belgarath, slowtiger

User avatar
mkelley
Posts: 1647
Joined: Fri Nov 02, 2007 5:29 pm
Location: Sunny Florida
Contact:

Post by mkelley »

Wes,

If you're talking about going through this and removing the popups that's what I'm about to do. Either let's compare notes or not reinvent the wheel <g>.

What I *really* need is to rethink our lipsync if I can get this thing working halfway decent. The problem we have is that we usually have three or four characters talking in a scene, and this approach (this script approach in this here thread :>) works with a single character audio file, unlike Papagayo which can handle multiple characters with ease (well, as easy as anything is to do in the dragging out dialog world of PG).

I started off by thinking that I'd edit the input audio files so they only contained one character, but that's enough of a PITA to make me stick with PG. But then it occurred to me that perhaps I could load up the data file containing all the dialog in each character's mouth track, and then just have an action or script which "turned off" the character talking at the appropriate points (right now I already have four separate mouth tracks that cover various emotions while talking, and a fifth track to cover "don't talk" is easy to create and then select).

I have to see how easy the workflow would be, along with testing to make sure this thing really does work on the majority of files (so far my success rate for the MS thingee is 50% - worked with one, didn't work with another). I think that's the only really issue, whether my audio stuff (which is studio recorded and *ought* to be usable) will pass.

And we'll never give up PG entirely -- we do have overlapping dialog at times, as well as singing, and this approach here won't handle either one. But it could sure save a bucketload of time otherwise.
User avatar
synthsin75
Posts: 9973
Joined: Mon Jan 14, 2008 11:20 pm
Location: Oklahoma
Contact:

Post by synthsin75 »

Mike,

I couldn't get Liset to work on enough of my files to be of any use right now, but having absolutely no (or very few) spaces (and no punctuation) in the text input seems to help at times.

Overall this just isn't useful enough for me to justify reworking the dialogs. Sorry. But I'm willing to bet that you'll find it only slightly more useful.

Mmm, I just had a thought. With most cartoon lip-sync, it doesn't need to be so precise. I just looked at tutorial 5.1 on the AS automatic lip-sync. Overall it does a good job. The only phoneme I really miss is the 'O' (which when I do lip-sync is the same shape as 'U' and 'W'). It may not be so bad to just do the auto lip-sync then go through it once adding the 'O' (or what ever phonemes you want) where needed.

I just tested this out. Had to put one of the 'phoneme' layers inside a switch layer so that it wouldn't add the 'O' automatically. Mmm, that may not be so bad. Use Papagayo to make an 'O' track. :lol:

Then you'd just have to change the main switch to the switch containing to 'O' when needed.


Maybe too convoluted to work well though. Let me know what you think, Mike. :wink:
User avatar
mkelley
Posts: 1647
Joined: Fri Nov 02, 2007 5:29 pm
Location: Sunny Florida
Contact:

Post by mkelley »

synthsin75 wrote:But I'm willing to bet that you'll find it only slightly more useful.
Because of what files it works on? Or something else?

I have to run through some more files (and I have tons to try) to see if it's going to be workable. Maybe I'll try ten tonight and see what the percentages are.

I really need as precise lip sync as possible -- you've seen my stuff, and if I dumb it down any more than that I'm *really* in trouble <bg>. At least the lip sync makes it look like I'm doing something <g>. So I don't want to do anything that's LESS good than PG does.
User avatar
mkelley
Posts: 1647
Joined: Fri Nov 02, 2007 5:29 pm
Location: Sunny Florida
Contact:

Post by mkelley »

Ahhghgh -- you're right. It doesn't even work on half the files I've tried (not even on simple ones).

Sigh. Oh well, I guess there's no substitute for (my wife's) hard work. Well, at least she's useful for something (she isn't reading this, is she?) LOL.
User avatar
synthsin75
Posts: 9973
Joined: Mon Jan 14, 2008 11:20 pm
Location: Oklahoma
Contact:

Post by synthsin75 »

Yeah, that's what I found. I assume that's one of the reasons 7feet never cleaned up the dialogs. :wink:
User avatar
heyvern
Posts: 7035
Joined: Fri Sep 02, 2005 4:49 am

Post by heyvern »

Are you guys saying it doesn't work on certain types of wav files? Do you know what format wav it does work with?

from a production viewpoint it would make sense to go to the extra effort to create alternate versions of the sound files just for this purpose. I had to do that recently with literally over a hundred MP3s. I just load em up in iTunes and convert to wav.

-vern
User avatar
synthsin75
Posts: 9973
Joined: Mon Jan 14, 2008 11:20 pm
Location: Oklahoma
Contact:

Post by synthsin75 »

Well it needs an uncompressed WAV, and it appears to need a good amount of variety in the wave form's amplitude to work at all.
User avatar
mkelley
Posts: 1647
Joined: Fri Nov 02, 2007 5:29 pm
Location: Sunny Florida
Contact:

Post by mkelley »

Vern,

I don't think it has much to do with the files per se -- our files are pristine, uncompressed, and about a good as quality as you can find. Now, if you're suggesting that perhaps I need to *downgrade* my files (perhaps save them as 8 bit, with less frequency response, etc.) that might be something to try, but the documentation for the MS component doesn't indicate you need lessor quality files.

Still, I'll give it a shot -- can't hurt to see if the MS stuff was meant for inferior audio (it might be -- the program itself tries to setup for onboard microphone recording at a low bitrate). I'll report back with any results.
User avatar
mkelley
Posts: 1647
Joined: Fri Nov 02, 2007 5:29 pm
Location: Sunny Florida
Contact:

Post by mkelley »

Um, you may be onto something.

I downgraded my files (from uncompressed, 16bit, stereo CD quality) to mono, 8bit 16hz and that seemed to work MUCH better. (Stereo to mono was a big help, but I think the best was going to 8bit). I still had a few problems but overall it's reading my files pretty good.

I'll play with it some more but at least it's worth exploring further (as I had pretty much given up on it).
User avatar
heyvern
Posts: 7035
Joined: Fri Sep 02, 2005 4:49 am

Post by heyvern »

Oh I get it. The software needs the actual voices to be relatively "clear" to discern the spoken words. Downgrading might actually make it more effective by giving it less "choices". I'm just guessing.

I was thinking of trying this out now that I have a relatively reliable Windows PC now.

-vern
User avatar
mkelley
Posts: 1647
Joined: Fri Nov 02, 2007 5:29 pm
Location: Sunny Florida
Contact:

Post by mkelley »

Vern,

It also might be that the 16bit CD files are too much data to process. Using the far simpler "radio quality" files definitely process faster (and without serious errors so far -- using my "quality" files caused my computer to lock up, getting the blue screen, and having the lset executable crash in varying degrees of seriousness).

I'm going to make an effort today to go through one of our old shows and process all the audio just to see how much actually works. I'll report back here with the results.
User avatar
mkelley
Posts: 1647
Joined: Fri Nov 02, 2007 5:29 pm
Location: Sunny Florida
Contact:

Post by mkelley »

Ah -- I give up.

This might work just fine for some people, but for me it just won't cut it. It appears that the basic problem is the length of the file. Files shorter than 30 seconds (the shorter the better) work just fine (as long as they aren't stereo) but anything longer and it just won't parse.

I tried splitting the same files that didn't work up and they worked fine in that mode -- but trying to recombine them was a PITA (not only that but the process of trying to decide where to split them just isn't very easy). And for all my animations (mostly text driven) 30 seconds just won't do (our scenes are nearly always over 30 seconds long).

It's really frustrating because otherwise the process is sound and works really well -- but it's back to Papagayo (another annoyance is this process uses the TH phoneme and I do believe actually gives better results than PG, even if just slightly).

Good luck to everyone else <g>.
User avatar
synthsin75
Posts: 9973
Joined: Mon Jan 14, 2008 11:20 pm
Location: Oklahoma
Contact:

Post by synthsin75 »

I figured as much. Oh well.

If you really want the 'TH', you could make your 'L' work for both. Just put the tongue to the bottom of the top teeth rather than the back. Then you can just change the Papagayo phoneme to 'L'. Mmm, I wonder if the PG library can be altered to do this automatically?

I may have to look into that. :wink:
Post Reply