## Gurobi : modifying the linear relaxation at each node of the MIP - gurobi

### Estimate gaussian (mixture) density from a set of weighted samples

Assume I have a set of weighted samples, where each samples has a corresponding weight between 0 and 1. I'd like to estimate the parameters of a gaussian mixture distribution that is biased towards the samples with higher weight. In the usual non-weighted case gaussian mixture estimation is done via the EM algorithm. Does anyone know an implementation (any language is ok) that permits passing weights? If not, does anyone know how to modify the algorithm to account for the weights? If not, can some one give me a hint on how to incorporate the weights in the initial formula of the maximum-log-likelihood formulation of the problem?
Thanks!

I've just had the same problem. Even though the post is older, it might be interesting to someone else. honk's answer is in principle correct, it's just not immediate to see how it affects the implementation of the algorithm. From the Wikipedia article for Expectation Maximization and a very nice Tutorial, the changes can be derived easily.
If $v_i$ is the weight of the i-th sample, the algorithm from the tutorial (see end of Section 6.2.) changes so that the $gamma_{ij}$ is multiplied by that weighting factor.
For the calculation of the new weights $w_j$, $n_j$ has to be divided by the sum of the weights $\sum_{i=1}^{n} v_i$ instead of just n. That's it...

You can calculate a weighted log-Likelihood function; just multiply the every point with it's weight. Note that you need to use the log-Likelihood function for this.
So your problem reduces to minimizing $-\ln L = \sum_i w_i \ln f(x_i|q)$ (see the Wikipedia article for the original form).

Just a suggestion as no other answers are sent.
You could use the normal EM with GMM (OpenCV for ex. has many wrappers for many languages) and put some points twice in the cluster you want to have "more weight". That way the EM would consider those points more important. You can remove the extra points later if it does matter.
Otherwise I think this goes quite extreme mathematics unless you have strong background in advanced statistics.

I was looking for a similar solution related to gaussian kernel estimation (instead of a gaussian mixture) of the distribution.
The standard gaussian_kde does not allow that but I found a python implementation of a modified version here
http://mail.scipy.org/pipermail/scipy-user/2013-May/034580.html

### Polynomial curve fitting

I want to implement polynomial curve-fitting using the least-squares technique but with various error functions i.e. not just least squares. Is there some way to do that in MATLAB? (I want to compare the results for different error functions. I also want to use regularisation for which I need to change the error function).
Can you share any resources (MATLAB/C++) that could provide some help on how to implement curve fitting without in-built function? I could only find those using gaussion elimination - is that the same as least squares fitting?

Gaussian elimination is not the same as least-squares fitting. The sense in which it is not the same as least-squares fitting resembles the sense in which gasoline is not the same as driving.
Gaussian elimination is a technique to solve a linear system. Least-squares solves a linear system and does some other things, so it can use Gaussian elimination.
In general, as far as I know, least-squares fitting in the generalized Moore-Penrose sense (see sect. 13.6 here; caution, heavy reading) is the canonical linear way to fit parameters. If you wish to use an unrelated error function, then you will have either (a) to depart from matrix techniques or (b) use less efficient iterative matrix techniques which do not approach the power of Moore-Penrose.
I realize that this is probably not the answer you wanted, but I believe that it is the answer. If you find out differently, let us know.

polynomial curve fitting is the first step towards learning "machine learning". My advise is to try least square first and then understand the probabilistic treatment of curve fitting. You can find this in (Bishop's Book). The summary is, you can assume that target value(t) for an input value (x) comes from a gaussian distribution. So error can be minimized by taking the maximum likelihood of the target value. This looks easy at the begining but the intuitive meaning has many insights. I would recommend you try this using matlab or r.

### Can we use dijkstra algorithm to find any cycles

Can we use Dijkstra's algorithm to find cycles???
Negative cycles
Positive cycles
If we can what, are the changes we have to do?

1) Dijkstra's doesn't work on graphs with negative edges because you can (possibly) find a minimum distance of negative infinity.
2) Um, you normally run it on graphs with cycles (otherwise, you might as well be traversing a tree), so it can handle them just fine.
If your real question is just about finding cycles, look at Finding all cycles in graph

No We cant use Dijkstra algorithm if negative cycles exist as the algorithm works on the shortest path and for such graphs it is undefined.Once you get to a negative cycle, you can bring the cost of your "shortest path" as low as you wish by following the negative cycle multiple times.
This type of restriction is applicable to all sort of algorithms to find shortest path in a graph and this is the same reason that prohibits all negative edges in Dijkstra.
You may modify Dijkstra to find the cycles in the graph but i do not think it is the best practice.
Rather you can possibility Use:
Tarjan's strongly connected components algorithm(Time complexity -O(|E| + |V|))
or Kosaraju's algorithm (uses DFS and is a linear time alogoritham)
https://en.wikipedia.org/wiki/Strongly_connected_component
Hope I answered your question.

### Dealing with outlier in a MFFC with DTW setup

I have a small command recognition system in which the user first records his commands then later the system tries to recognize them . The front end's feature vector are MFCC's coefficients. The back end does recognition using DTW to align these feature vector and outputting a score ( 0 -> commands are equal). The problem with this setup is distinguishing commands (the ones which the user recorded) from other words. Picking a maximum score as threshold for which commands are recognized doesn't give good results. I looked up LDA and PCA with the purpose of projecting the recorded features to a different feature space where they could be more separable. Each recorded command is a class that has as samples feature vectors from the front-end associated with the frame of that command. From that I computed the transformation required for LDA, and applied the transformation to each set of resulting MFCC coefficients. That didn't give me a separability between recorded commands and urecorded commands.
My questions would be:
is the approach in applying LDA the wrong one?
are there other methods more suited for my setup (MFCC + DTW)?
Any help or guidance is greatly appreciated.
Thank you

The problem with this setup is distinguishing commands that were not recorded.
You probably want to express better that you want to separate keywords you are looking for from all other possible words. It's not clear what do you mean by "that were not recorded"
is the approach in applying LDA the wrong one?
It's not wrong, it's meaningless. The PCA optimizes different properties and by no mean could improve separation.
Picking a maximum score as threshold for which commands are recognized doesn't give good results.
This approach is not the best possible one but it should work relatively well. It was proven over ages. You probably just made a mistake in it's implementation or testing or there is some other bug. I suggest you to revisit it.
The only thing you need to know is that threshold has to be dependent on template keyword. So for different template keywords threshold has to be different. A single threshold will not work.

### Implementation issues in 2-Satisfiability problem

I want to implement 2-SAT problem for 100000 literals. So there would be 200000 vertices. So I am stuck on having a array of all reachable vertices from each vertex, space complexity of O(200000^2) which is infeasible So please suggest a solution for this. And please throw some light on efficient implementation of 2-SAT problem.

From wikipedia:
... 2-satisfiability can be solved in polynomial time. As Aspvall, Plass & Tarjan (1979) observed, a 2-satisfiability instance is solvable if and only if every variable of the instance belongs to a different strongly connected component of the implication graph than the negation of the same variable. Since strongly connected components may be found in linear time by an algorithm based on depth first search, the same linear time bound applies as well to 2-satisfiability.
I won't pretend to understand most of that paragraph, but it would appear that there is an algorithm which can be used to solve the 2-SAT problem, and it is described within that referenced document (A linear-time algorithm for testing the truth of certain quantified boolean formulas). It can apparently be purchased online for about \$20 USD. I'm Not sure if that's helpful or not, but there it is!
update: A free PDF of the same document can be found here. Credit goes to liori for the find.

This whole thread is a bit messed up. Yes, one can solve 2-sat in linear time, but no - you can not solve it for that many variables. The time to solve 2-sat is linear with respect to the number of the implication, which for 200 000 variables could reach up to (200000*199999)/2 and furthermore if you use this solution you will need about the same amount of memory. There is another solution(not using strongly connected components that is slower but doesn't need that much memory).