This is where today, you don’t know the ancient population, you don’t have that DNA profile exactly. You don’t know their exact numbers, you don’t know the exact descendants, don’t have the exact number of generations, don’t know the who married whom have no don’t know the exact descendants no idea if only direct descendants are here you don’t have the exact DNA profiles either, so this is the model which people have today, and it is this model, that they are taking applying, mathematics giving you an answer and claiming to write these global papers with global claims about who came to India at one type frame and where I hope I given you a flavor for the mathematical issues and technological issues.

I am deconstructing the research methodology and saying there problems everywhere. You cannot take these results and start doing things. So, the use assumes statistical models, assumes parameters, do some curve fitting, this limited predictability with these things. On the other side, this is the admixture problem, there is one more problem called PCA. PCA is called principal component analysis in that analysis what they do is, if you have a matrix, where this is geographical region, 1 region,, 2 region, 3. For example, you start from southern India, say this village, work your way to the north, this village, this village, this village, village by village you go, and you take the different markers you find out, that generate profile and say 30% of them are carrying this marker, 20% this marker, and so on. You identify these various markers, you are carrying and you have a matrix, if you do something called singular value decomposition, how many other engineers here? Some are engineers. So you might have heard singular value decomposition somewhere in your past. Once you do this mathematical algorithm, it gives you a bunch of numbers called principal components.

So basically David Reich is taking the largest principal components p1 and p2 and is placing these regions over here, which region how does it fit in this graph over here and then he gets a gradient. based on that gradient, he says the northern Indian population is closer to Central Asia, southern Indian population is an isolate. therefore he got the ANI in the ASI. however what they have done to get a gradient is in the southern Indian sample, they have included the Andamanese, they included the Andaman DNA, along with the southern Indian sample. so that there is a clustering possible to create an artificial gradient. the Andaman is stop mixing the mainstream population 40,000 to 50,000 years ago, why would you include them in today’s narrative of who we are, even if you are interested in 5,000 year old data. Andaman is shouldn’t be there, just simply shouldn’t be there, in the data. so you skewed your data, skewed the numbers in your matrix, so that you will get a kind of a result, that will do what you want to do. so whether it is admixture or PCA analysis, I am claiming that, one has got to do great diligence, why did you put those numbers, why did you put the Andamanese, why did you choose a model that ascertain argument. so many questions can be raised. a good pure review would do all of these things unfortunately like I told you is a multidisciplinary field. nobody’s got the expertise of the data span across this, the guy who is an expert in biology has no clue of mathematics, the guy who knows mathematics doesn’t know biology. so who’s going to peer-review these papers, so this is the problem the papers get admitted. there’s not sufficient peer review and all kinds of issues happen to close out this section.

I would like to give you one example that talks about circular logic. this from a paper in 2015, from California Berkeley, in language. so these people went about to fuse linguistics and genetics, okay. they wanted to say, let’s apply our no knowledge of linguistics and no knowledge of genetics and see whether some of our models of fitting including Aryan invasion and all these kind of things. they took a dictionary of 200 words and they did this very strange thing, all these, these are all the various people, you can’t read it from there, but trust me, you and I can read from here. so various people over here and these links over here, say how genetically close these people are to these people. so these black bars over here that is saying how close they are. for example, this is far from here, this is far from here, these are very closely related. so these are ancestral constraints, the Clade constraints, or black bars. in addition they also took time bars, time constraints, time concern linguistic model gives you time, how old is Sanskrit, when did it diverged from some of the language. so Sanskrit, Vedic Sanskrit, they put here to around 3,000 or some such thing, the next closest to that is the Hittites, Hittites are over here. nobody speaks Hittite anymore, ancient Greeks are over here, Assyrians talk Aryan, and other languages are here. so they took the time constraints given by linguistic models and they took the genetic constraints given by people like David Reich, others.

Then… let’s now start a mathematical problem trying to see it as a model converge and say Aaha! it converges, it observes closeness to the steppe hypothesis. unfortunately the whole thing is a circular argument, the genetic model is a circular argument, linguistic model is a self-fulfilling circular argument, the whole thing is an exercise in mathematics, with no bearing to reality at all. but these papers are published. so my conclusion has not changed from last year, where I am saying that, genetic studies uses preconceived models and markers and constrains results are not primary evidence. they can only serve as supporting evidence, one will have to see sensitivity of results, to population size composition and assumptions just like I said last year. what will happen if I take a few pieces of data out and put some other pieces of data in how are your results going to change. as an engineer that’s what I do, when my team comes and tells me that, here is a model that is working beautifully. I will do the diligence and say, alright, I am going to remove this data out do the studies, again is a conclusion similar how robust is your conclusion. if I took this pieces of data out and it converts to an entirely different answer that means your model depends strongly on these few data points. you see what I am saying, so that is a sensitivity you need to study sensitivity.

Then the composition, what is a composition, am I going to take data from the IT park and Gorgon or am I going to go and say ‘you better be living the same place as a grandfather, for me to take your data. so that one and the size how many am I going to admit if I am going to talk the size of the Indian population, am I going to take 1,000 brahmins and about 10 Shudras and something else and say I have got a genetic profile of Indians. so problem that too what percentage are you going to take because of endogamy we have got some certain differences in us. so what are you going to take? So all kinds of issues are there and you need to be careful over there, need to be careful and tempting aligned mathematical numbers alongside a narrative and avoid subjective biases to creep into the result the reason is like this ,like I said last time supposing after all this convergence analysis you go and give the professor you all your numbers, I got all these numbers. this one is closely related to this, this one is closely related to that, and the difference between this group of people and this group of people is 0.001 and you say that 0.001 is enough for me to differentiate these populations just a number. my question is how did he calibrate that? why is 0.001 significant in your data, you need to do talk to me about scaling, you need to talk to about significance of the numbers, before you come and say that these much resolution is enough to say, this people are different from that. that is a problem in that ASI – ANI model.

You had to put Andamanese people to create an artificial gradient between North Indian and South Indian. if you throw that Andamanese DNA, there is no genetic difference between the North Indian and South India. I suppose a pastoral people. so you see what I am saying, one has got to be super careful when you do all of these kind of studies. so a good critique is going to go and check all of these things and figure out what is happening, before you admit some of these things. unfortunately in our country the minute a paper comes from Harvard with 90 signatories in the paper or two thousand signatories in the paper. its conflated in the news immediately saying Aryan invasion proven and all these things how are the journalist equipped to do any of this analysis. so we have a very big problem. next time somebody talks about these things I hope you are able to get there and rebut on the basis of some of these things.